lof' 

J.  Komorowski  Z.W.  Ras  (Eds.) 

Methodologies  for 
Intelligent  Systems 

7th  International  Symposium,  ISMIS  ’93 
Trondheim,  Norway,  June  15-18,  1993 
Proceedings 


Springer-Verlag 

Berlin  Heidelberg  New  York 
London  Paris  Tokyo 
Hong  Kong  Barcelona 
Budapest 


fUnC  QUALITY  INSPECTED  d 


Best 

Available 

Copy 


Lecture  Notes  in  Artificial  Intelligence 

Subseries  of  Lecture  Notes  in  Computer  Science 
Edited  by  J.  Siekinann 

Lecture  Notes  in  Computer  Science 

Edited  by  G.  Goos  and  J.  Hartmanis 


Series  Editor 


Jdrg  Siekmann 
University  of  Saarland 

German  Reseaa'h  Center  for  Artificial  Intelligence  (DFKI) 
Stuhlsat/cnhausweg  3,  W-66()0  Saarbriicken  1 1,  FRG 


Volume  Editors 

Jan  Komorowski 
Knowledge  Systems  Group 

Faculty  of  Computer  Science  and  Electrical  Fneineering 

The  Norwegian  Institute  of  Technology,  The  University  of  Trondheim 

N-7fl34  Trondheim,  Norway 

Zbigniew  W.  Ra,s' 

Department  of  Computer  Science 
University  of  North  Carolina 
Charlotte,  NC  28223,  USA 


CR  Subject  Classification  (1991)'.  1.2 

ISBN  3-540-56804-2  Springer- Verlag  Berlin  Heidelberg  New  York 
ISBN  0-387-56804-2  Springer-Verlag  New  York  Berlin  Heidelberg 


This  work  is  subject  to  copyright.  All  rights  arc  reserved,  whether  the  whole  or  part 
of  the  material  is  concerned,  specifically  the  rights  of  translation,  reprinting,  re-use 
of  illustrations,  recitation,  broadcasting,  reproduction  on  microfilms  or  in  any  other 
way,  and  storage  in  data  banks.  Duplication  of  this  publication  or  parts  thereof  is 
permitted  only  under  the  provisions  of  the  German  Copyright  Law  of  September  9. 
1965,  in  its  current  version,  and  permission  for  use  must  always  be  obtained  from 
Springer-Verlag.  Violations  are  liable  for  pro.seculicn  under  the  German  Copyright 
Law. 

©  Springer-Verlag  Berlin  Heidelberg  1993 
Printed  in  Germany 

Typesetting:  Camera  ready  by  author 

Printing  and  binding;  Druckhaus  Bellz,  Hemsbach/Bergstr. 

45/3140-543210  -  Printed  on  acid-free  paper 


Preface 


This  volume  contains  papers  which  were  selected  for  presentation  at  the  Seventh 
International  Symposium  on  Methodologies  for  Intelligent  Systems  -  ISMIS’93, 
held  in  Trondheim,  Norway,  June  15-18,  1993.  The  symposium  was  hosted  by  the 
Norwegian  Institute  of  Technology  and  sponsored  by  The  University  of  TVond- 
heim,  NFR/NTNF  -  The  Norwegian  Research  Council,  UNC-Charlotte,  Office  of 
Naval  Research,  Oak  Ridge  National  Laboratory  and  ESPRIT  BRA  Compulog 
Network  of  Excellence. 

ISMIS  is  a  conference  series  that  was  started  in  1986  in  Knoxville,  Tennessee. 
It  has  since  then  been  held  in  Charlotte,  North  Carolina,  once  in  Knoxville,  and 
once  in  Torino,  Italy. 

The  Organizing  Committee  has  decided  to  select  the  following  major  areas 
for  ISMIS’93; 

-  Approximate  Reasoning 

-  Constraint  Programming 

-  Expert  Systems 

-  Intelligent  Databases 

-  Knowledge  Representation 

-  Learning  and  Adaptive  Systems 

-  Manufacturing 

-  Methodologies 

The  contributed  papers  were  selected  from  more  than  120  full  draft  papers 
by  the  following  Program  Committee:  Jens  Balchen  (NTH,  Norway),  Alan  W. 
Biermann  (Duke,  USA),  Alan  Bundy  (Edinburgh,  Scotland),  Jacques  Calmet 
(Karlsruhe,  Germany),  Jaime  Carbonell  (Carnegie-Mellon,  USA),  David  Hislop 
(US  Army  Research  Office),  Eero  Hyvonen  (VTT,  Finland),  Marek  Karpinski 
(Bonn,  Germany),  Yves  Kodratoff  (Paris  VI,  France),  Jan  Komorowski  (NTH, 
Norway),  Kurt  Konolige  (SRI  International,  USA),  Catherine  Lassez  (Yorktown 
Heights,  USA),  Lennart  Ljung  (Linkoping,  Sweden),  Ramon  Lopez  de  Mantaras 
(CSIC,  Spain),  Alberto  Martelli  (Torino,  Italy),  Ryszard  Michalski  (George  Ma¬ 
son,  USA),  Jack  Minker,  (Maryland,  USA),  Rohit  Parikh  (CUNY,  USA),  Judea 
Pearl  (UCLA,  USA),  Don  Perlis  (Maryland,  USA),  Francois  G.  Finn  (ORNL, 
USA),  Henri  Prade  (Toulouse,  France),  Zbigniew  W.  Ras  (UNC,  USA),  Barry 
Richards  (Imperial  College,  UK),  Colette  Rolland  (Paris  I,  France),  Lorenza 
Saitta  (Trento,  Italy),  Erik  Sandewall  (Linkoping,  Sweden),  Richmond  Thoma¬ 
son  (Pittsburgh,  USA),  Enn  Tyugu  (KTH,  Sweden),  Ralph  Wachter  (ONR, 
USA),  S.K.  Michael  Wong  (Regina,  Canada),  Erling  Woods  (SINTEF,  Nor¬ 
way),  Maria  Zemankova  (NSF,  USA)  and  Jan  Zytkow  (Wichita  State,  USA). 
Additionally,  we  acknowledge  the  help  in  reviewing  the  papers  from;  M.  Becker- 
man,  Sanjiv  Bhatia,  Jianhua  Chen,  Stephen  Chenoweth,  Bill  Chu,  Bipin  Desai, 
Keith  Downing,  Doug  Fisher,  Melvin  Fitting,  Theresa  Gaasterland,  Atillio  Gior- 
dana,  Charles  Glover,  Diana  Gordon,  Jerzy  Grzymala-Busse,  Cezary  Janikow, 
Kien-Chung  Kuo,  Rei-Chi  Lee,  Charles  Ling,  Anthony  Maida,  Stan  Matwin, 


VI 


Neil  Murray,  David  Mutchler,  Jan  Plaza,  Helena  Rasiowa,  Steven  Salzberg,  P.F. 
Spelt,  David  Reed,  Michael  Sobolewski,  Stan  Szpakowicz,  Zbigniew  Stachniak, 
K.  Thirunarayan,  Marianne  Winslett,  Agata  Wrzos-Kamiriska,  Jacek  VVrzos- 
Kamiiiski,  Jing  Xiao,  Wlodek  Zadrozny  and  Wojtek  Ziarko. 

The  Symposium  was  organized  by  the  Knowledge  Systems  Group  of  the 
Department  of  Computer  Systems  and  Telematics,  The  Norwegian  Institute  of 
Technology.  The  Congress  Department  of  the  Institute  provided  the  secretariat 
of  the  Symposium.  The  Organizing  Committee  consisted  of  Jan  Komorowski, 
Zbigniew  W.  Ras  and  Jacek  Wrzos-Kaminski. 

We  wish  to  express  our  thanks  to  Francois  Bry,  Lennart  Ljung,  Michael 
Lowry,  Jack  Minker,  Luc  De  Raedt  and  Erik  Sandewall  who  presented  the  invited 
addresses  at  the  symposium.  We  would  also  like  to  express  our  appreciation  to 
the  sponsors  of  the  symposium  and  to  all  who  submitted  papers  for  presentation 
and  publication  in  the  proceedings.  Special  thanks  are  due  to  Alfred  Hofmann 
of  Springer  Verlag  for  his  help  and  support. 

Finally,  we  would  like  to  thank  Jacek  Wrzos-Kaminski  whose  contribution  to 
organizing  this  symposium  was  essential  to  its  becoming  a  success. 


March  1993 


J.  Komorowski,  Z.W.  Ras 


Table  of  Contents 


Invited  Talk  I 

J.  M  inker  k  C.  Ruiz 

On  Extended  Disjunctive  Logic  Programs  . 1 

Logic  for  Artificial  Intelligence  I 

H.  Chu  &  D.A.  Plaisted 

Model  Finding  Stategies  in  Semantically  Guided  Instance-Based 

Theorem  Proving  . 19 

D.R.  Busch 

An  Expressive  Three-Valued  Logic  with  Two  Negations  . 23 

J.  Posegga 

Compiling  Proof  Search  in  Semantic  Tableaux  . 39 

R.  Hahnie 

Short  CNF  in  Finetely- Valued  Logics  . 49 

L.  Giordano 

Defining  Variants  of  Default  Logic:  a  Modal  Approach  . 59 

Expert  Systems 

L.-Y.  Shue  &  R.  Zamani 

An  Admissible  Heuristic  Search  Algorithm  . 69 

S. -J.  Lee  k  C.-H.  Wu 

Building  an  Expert  System  Language  Interpreter  with  the  Rule 

Network  Technique  . 76 

G.  Valiente 

Input-Driven  Control  of  Rule-Based  Expert  Systems  . 86 

B.  Lopez  k  E.  Plaza 

Case-Based  Planning  for  Medical  Diagnosis  . 96 

J.P,  Klut  k  J.H.P.  Eloff 

MethoDex:  A  Methodology  for  Expert  Systems  Development  . 106 

Invited  Talk  II 

F.  Bry 

Towards  Intelligent  Databases  . 116 


VIII 


Logic  for  Artificial  Intelligence  II 

L.  Padgham  B.  Nebel 

Combining  Classification  and  Nonmonotonic  Inheritance 

Reasoning:  A  First  Step  . 132 

H.  Rasiowa  &  V.W.  Marek 

Mechanical  Proof  Systems  for  Logic  II,  Consensus  Programs 

and  Their  Processing  (Extended  Abstract)  . 142 

J.  Chen 

The  Logic  of  Only  Knowing  as  a  Unified  Framework  for 

Nonmonotonic  Reasoning  . 152 

P.  Lambrix  Ac  R.  Ronnquist 

Terminological  Logic  Involving  Time  and  Evolution: 

A  Preliminiary  Report  . 162 

Intelligent  Databases 

L. V.  Orman 

Knowledge  Management  by  Example  . 172 

H.M.  Dewan  Sc  S.J.  Stolfo 

System  Reorganization  and  Load  Balancing  of  Parallel  Database 

Rule  Processing  . 186 

T.  Gaasterland  &  J.  Lobo 

Using  Semantic  Information  for  Processing  Negation  and  Disjunction 
in  Logic  Programs  . 198 

P.  Bose,  L.  Lietard  Ai  O.  Pivert 

On  the  Interpretation  of  Set-Oriented  Fuzzy  Quantified  Queries 

and  Their  Evaluation  in  a  Database  Management  System  . 209 

Invited  Talk  III 

M. R.  Lowry 

Methodologies  for  Knowledge-Based  Software  Engineering  . 219 

Logic  for  Artificial  Intelligence  III 

N.  Leone,  L.  Palopoli  Ai  M.  Romeo 

Updating  Logic  Programs  . 235 

D.  Robertson,  3.  Agusti,  3.  Hesketh  Ac.  3.  Levy 

Expressing  Program  Requirements  using  Refinement  Lattices  . 245 

R.  Chadha  Ai  D.  A.  Plaisted 

Finding  Logical  Consequences  Using  Unskolemization  . 255 


IX 


A.  Rajasekar 

Controlled  Explanation  Systems  . 265 

N.V.  Murray  &  E.  Rosenthal 

Signed  Formulas:  A  Liftable  Meta-Logic  for  Multiple-Valued  Logics  . 275 

Approximate  Reasoning 

S.  Tano,  W.  Okamoto  lii  T.  Iwatani 

New  Design  Concepts  for  the  FLINS-Fuzzy  Lingual  System: 

Text-Based  and  Fuzzy-Centered  Architectures  . 285 

A.  Skcwron 

Boolean  Reasoning  for  Decision  Rules  Generation  . 295 

C.W.R.  Chau,  P.  Lingras  &  S.K.M.  Wong 

Upper  and  Lower  Entropies  of  Belief  Functions  Using  Compatible 
Probability  Functions  . 306 

C.J.  Liau  &  B.I-P.  Lin 

Reasoning  About  Higher  Order  Uncertainty  in  Possibilistic  Logic  . 316 

C.M.  Rauszer 

Approximation  Methods  for  Knowledge  Representation  Systems  . 326 

Invited  Talk  IV 

L.  Ljung 

Modelling  of  Industrial  Systems  . 338 

Constraint  Programming 

J.-F.  Puget 

On  the  Satisfiability  of  Symmetrical  Constrainted  Satisfaction  Problems  350 

A.L.  Brown  Jr.,  S.  Mantlia  &  T.  Wakayama 

A  Logical  Reconstruction  of  Constraint  Relaxation  Hierarchies 

in  Logic  Programming  . 362 

P.  Berlandier 

A  Performance  Evaluation  of  Backtrack-Bounded  Search  Methods 

for  N-ary  Constraint  Networks  . 375 

M. A.  Meyer  &  J.P.  Muller 

Finite  Domain  Consistency  Techniques:  Their  Combination 

and  Application  in  Computer-Aided  Process  Planning  . 385 


X 


Learning  and  Adaptive  Systems  I 

I.F.  Imam  k  R.S.  Michalski 

Should  Decision  Trees  be  Learned  from  Examples  or  from 

Decision  Rules?  . 395 

H.  Lou  n  is 

Integrating  Machine- Learning  Techniques  in  Knowledge- Rased 

Systems  Verification  . 405 

R.  Bagai,  V.  Shanbhogue,  J.M.  Zytkow  k  S.C.  Chou 

Automatic  Theorem  Generation  in  Plane  Geometry  . 415 

A.  Giordana,  L.  Saitta  k  C.  Baroglio 

Learning  Simple  Recursive  Theories  . 425 

Invited  Talk  V 

L.  De  Raedt  k  N.  Lavrac 

The  Many  Faces  of  Inductive  Logic  Programming  . 435 

Methodologies 

M.  Bateman,  S.  Martin  k  A.  Slade 

CONSENSUS:  A  Method  for  the  Development  of  Distributed 

Intelligent  Systems  . 450 

H.  Gan 

Script  and  Frame:  Mixed  Natural  Language  Understanding  System 

with  Default  Theory  . 466 

M.  Frauuva,  Y.  Kodratoff  k  M.  Gross 

Contractive  Matching  Methodology:  Formally  Creative  or  Intelligent 
Inductive  Theorem  Proving?  . 476 

G.  Grosz  k  C.  Rolland 

Representing  the  Knowledge  Used  During  the  Requirement  Engineering 
Activity  with  Generic  Structures  . 486 

S.  Caselli,  A.  Natali  k  F.  Zanichelli 

Development  of  a  Programming  Environment  for  Intelligent  Robotics  . . .  .496 
Knowledge  Representation 
A.  Schaerf 

On  the  Complexity  of  the  Instance  Checking  Problem  in  Concept 
Languages  with  Existential  Quantification  . 508 

S.  Ambroszkiewicz 

Mutual  Knowledge  . 518 


XI 


K.  'rhiruuarayaii 

t^xprvssivc  Extensions  to  Inheritance  Networks  . 528 

C5  Bittencourl 

A  Connectionist-Symholic  Cognitive  Model  . 538 

M.  Di  Manzo  ir  K.  Giunchiglia 

Multi-Context  Systems  as  a  Tool  to  Model  Temporal  Evolution  . 548 

Invited  Talk  VI 

F],  Sandewall 

Systematic  Assessment  of  Temporal  Reasoning  Methods  for  Use  in 
Autonomous  Agents  . 558 

Manufacturing 

Ch.  Klauck  &  J.  Scliwagereit 

GGD:  Graph  Grammar  Developer  for  Features  in  CAD/CAM  . 571 

K.  Wang 

A  Knowledge- Based  Approach  to  Group  Analysis  in  Automated 
Manufacturing  Systems  . 581 

H  - r.B.  Chu  H.  Du 

CENTER:  A  System  Architecture  for  Matching  Design  anil 

Manufacturing  . 591 

M.  Sobolewski 

Knowledge-Based  System  Integration  in  a  Concurrent  Engineering 
Environment  . COl 

Learning  and  Adaptive  Systems  II 

P.  Charlton 

A  Reflective  Strategic  Problem  Solving  Model  . 612 

B.  Wiithrich 

On  the  Learning  of  Rule  Uncertainties  and  Their  Integration 

into  Probabilistic  Knowledge  Bases  . 622 

R.  Zembowicz  &  J.M.  Zytkow 

Recognition  of  Functional  Dependencies  in  Data  . 632 

R.  Slowihski 

Rough  Set  Learning  of  Preferential  Attitude  in  Multi-Criteria 

Decision  Making  . 642 


Authors  Index 


653 


On  Extended  Disjunctive  Logic  Programs 


Jack  Minker*'^  and  Carolina  Ruiz* 

*  Department  of  Computer  Science. 

^  Institute  for  Advanced  Computer  Studies. 
University  of  Maryland.  College  Park,  MD  20742  U.S.A. 
{minkcr  ,  cruizc}@cs.umd.edu 


Abstract.  This  paper  studies,  in  a  comprehensive  manner,  different  as¬ 
pects  of  extended  disjunctive  logic  programs,  that  is,  programs  whose 
clauses  are  of  the  form  l\  V  ...  V  /*  »-  no/  /m+i>  ..,no/  In, 

where  are  literals  (i.e.  atoms  and  classically  negated  atoms), 

and  not  is  the  negation-by-default  operator.  The  explicit  use  of  classical 
negation  suggests  the  introduction  of  a  new  truth  value,  namely,  logical 
falsehood  (in  contract  to  falsehood-by-default)  in  the  semantics.  General 
techniques  are  described  for  extending  the  model,  fixpoint,  and  proof 
theories  of  an  arbitrary  semantics  of  normal  disjunctive  logic  programs 
to  cover  the  class  of  extended  programs.  Illustrations  of  these  techniques 
are  given  for  stable  models,  disjunctive  well-founded  and  stationary  se¬ 
mantics.  Also,  the  declarative  complexity  of  the  extended  programs  as 
well  as  the  algorithmic  complexity  of  the  proof  procedures  are  discussed. 


1  Introduction 

Logic  programming,  as  an  approach  to  the  use  of  logic  in  knowledge  represen¬ 
tation  and  reasoning,  has  gone  through  different  stages.  First,  logic  programs 
containing  only  Horn  clauses  were  considered.  A  Horn  clause  is  a  disjunction 
of  literals  in  which  at  most  one  literal  is  positive  and  can  be  written  either  as: 
“a  bi, . . . ,  bm”  or  as  bi, . . . ,  bm”  where  o,  bi,. .  .,bm  are  atoms  and  m  >  0. 
The  semantics  of  these  programs  is  well  understood  (see  [31,  15])  and  is  captured 
by  the  unique  minimal  Herbrand  model  of  the  program. 

It  is  clear  that  since  only  positive  atoms  occur  in  (the  head  of)  Horn  clauses, 
no  negative  information  can  be  inferred  from  these  programs  unless  some  strat¬ 
egy  or  rule  for  deriving  negative  information  is  adopted.  Two  rules  for  nega¬ 
tion  were  initially  proposed  for  Horn  programs;  The  Closed  World  Assumption 
(CWA)  [28]  which  states  that  an  atom  can  be  assumed  to  be  false  if  it  cannot  be 
proven  to  be  true;  and  the  Clark  completion  theory  [7]  which  assumes  that  the 
definition  of  each  atom  in  a  program  is  complete  in  the  sense  that  it  specifies  all 
the  circumstances  under  which  the  atom  is  true  and  only  such  circumstances,  so 
the  atom  can  be  inferred  false  otherwise. 

Having  a  rule  for  negation,  it  is  plausible  to  extend  Horn  clauses  to  make  use 
of  negative  information.  This  is  the  purpose  of  the  so-called  negation-by-default 
operator  not,  which  may  appear  in  the  bodies  of  clauses.  These  clauses  are  called 
normal  clauses  and  are  of  the  form:  “a  «—  6i , . . . ,  5^,  not  C\,. . not  c„”  where 


2 


a,bi, . . .  ,bm,ci, . . .  ,c„  are  atoms  and  m,  n  >  0.  This  kind  of  negation  is  limited, 
however,  in  the  sense  that  not  p  does  not  refer  to  the  presence  of  knowledge 
asserting  the  falsehood  of  the  atom  p  but  only  to  the  lack  of  evidence  about  its 
truth.  Indeed,  some  authors  have  translated  not  p  as  “p  is  not  believed”  [14],  “p 
is  not  known”  [10],  and  “there  is  no  evidence  that  p  is  true"  [9],  in  addition  to 
the  common  translation  “p  is  not  provable  from  the  program  in  question” . 

In  contrast  to  the  Horn  case,  there  is  no  agreement  on  a  unique  semantics 
for  normal  programs  since  there  can  be  as  many  different  semantics  as  there 
are  ways  to  interpret  the  meaning  of  not.  Among  the  proposed  semantics  are 
the  perfect  model  semantics  [24],  the  stable  model  semantics  [11],  and  the  well- 
founded  semantics  (WFS)  [32]. 

Another  generalization  of  Horn  clauses  that  allows  disjunctions  of  atoms 
in  the  heads  of  clauses  has  been  studied  extensively  (see  [16]).  These  clauses 
are  called  disjunctive  clauses  and  are  of  the  following  form:  “ai  V  ...  V  a*  *— 
,  •  •  •  <  ^m”  where  a^, . .  .,aic,bi, . . .  ,bm  are  atoms  and  lb,  m  >  0.  The  meaning 
of  such  a  program  is  captured  by  its  set  of  minimal  Herbrand  models.  Several 
rules  for  negation  have  been  introduced  for  disjunctive  logic  programs:  the  Gen¬ 
eralized  Closed  World  Assumption  (GCWA)  [20]  which  assumes  that  an  atom 
is  false  when  it  does  not  belong  to  any  of  the  minimal  Herbrand  models  of 
the  program,  the  Extended  Generalized  Closed  World  Assumption  (EGCWA) 
[33]  which  applies  exactly  the  same  criterion  of  the  GCWA  but  to  conjunctions 
of  atoms  instead  of  only  atoms  (see  Sect.  4)  and  the  Weak  Generalized  Closed 
World  Assumption  (WGCWA)  [27]  (or  equivalently,  the  Disjunctive  Database 
Rule  (DDR)  [29])  which  states  that  an  atom  can  be  assumed  to  be  false  when 
it  does  not  appear  in  any  disjunction  derivable  from  the  program. 

Negative  information  can  be  introduced  in  disjunctive  clauses  in  the  same 
fashion  as  in  Horn  clauses.  The  resulting  clauses  are  called  normal  disjunctive 
clauses  and  are  of  the  form:  “ai  V . .  .Vat  «—  6i, . . . ,  6m,  not  ci, . . . ,  not  c„”  where 
oi, . . . ,  a*,  6i, . . . ,  6m,  Cl ,  . . . ,  c„  are  atoms  and  fc,  m,  rj  >  0.  There  are  also  various 
different  semantics  proposed  for  normal  disjunctive  logic  programs  (henceforth, 
denoted  by  ndlps),  among  others,  the  stable  disjunctive  model  semantics  [23], 
the  disjunctive  well-founded  semantics  (DWFS)  [2],  the  generalized  disjunctive 
well-founded  semantics  (GWFS)  [3,  4],  WF^  [5],  and  the  stationary  semantics 
[26]. 

It  is  worth  noting  that  normal  clauses  are  particular  cases  of  disjunctive  nor¬ 
mal  clauses.  Therefore  any  semantics  defined  for  the  class  of  normal  disjunctive 
logic  programs  is  also  a  semantics  for  the  clciss  of  normal  logic  programs. 

An  alternative  to  overcome  some  of  the  difficulties  of  dealing  with  negative  in¬ 
formation  is  to  make  explicit  use  of  classical  negation  in  addition  to  negation-by- 
default.  In  this  way,  the  expressive  power  of  logic  programs  is  increased  since  the 
user  is  now  allowed  to  state  not  only  when  an  atom  is  true  but  also  when  it  is  false 
(without  any  ambiguity  or  default  interpretation).  Clauses  obtained  by  explicitly 
using  the  classical  negation  operator  (-i)  are  called  extended  disjunctive  clauses 
and  are  of  the  following  form:  “li  V  ...  V /*  «—  It+i,  ...,lm,not  /m+i,...,no< 
where  are  literals  (i.e.  atoms  and  classically  negated  atoms),  0  <  F  < 


3 


m  <  n.  Hence,  extended  disjunctive  clauses  contain  two  forms  of  negation:  cleis- 
sical  and  default. 

Previous  contributions  in  this  area  include  the  following:  Pearce  and  Wagner 
[22]  added  explicit  negative  information  to  Prolog  programs.  They  showed  that 
there  is  no  need  to  alter  the  computational  structure  of  such  programs  to  include 
classical  negation  since  there  is  a  way  to  transform  extended  programs  to  positive 
ones  which  preserves  the  meaning  of  the  programs.  Gelfond  and  Lifschitz  [12] 
extended  their  stable  model  semantics  to  cover  classical  negation.  Przymusinski 
[25]  generalized  this  extended  version  of  the  stable  model  semantics  to  include 
disjunctive  programs.  Alferes  and  Pereira  [1]  provided  a  framework  to  compare 
the  behavior  of  the  different  semantics  in  the  presence  of  two  kinds  of  negation. 

The  purpose  of  this  paper  is  to  study,  in  a  comprehensive  manner,  different 
aspects  of  extended  disjunctive  logic  programs  {edlps  for  short).  We  describe 
general  techniques  to  deal  with  this  extended  class  of  programs  and  also  survey 
some  of  the  results  in  the  field. 

Alternative  semantics  for  edlps  can  be  obtained  by  extending  the  semantics 
known  for  the  class  of  normal  disjunctive  logic  programs.  Since  there  are  now 
two  different  notions  of  falsehood  in  extended  programs  we  distinguish  between 
them  by  saying  that,  with  respect  to  some  semantics,  a  formula  (p  is  false-by- 
default  in  an  edlp  P  if  not  (^)  is  provable  from  P,  i.e.  ip  is  assumed  to  be  false 
by  the  particular  rule  for  negation  used  by  the  semantics;  and  is  logically  false 
(or  simply  false)  if  -<(p  is  provable  from  P,  or  in  other  words,  if  -'p  is  a  logical 
consequence  of  P.  We  extend  each  semantics  to  include  a  new  truth  value:  logical 
falsehood. 

With  the  introduction  of  negated  atoms  in  the  heads  of  the  clauses,  it  is 
possible  to  specify  inconsistent  theories,  that  is,  to  describe  situations  in  which 
some  atom  p  and  its  complement  -ip  are  true  simultaneously.  Therefore,  we  must 
develop  techniques  to  recognize  when  a  program  is  inconsistent  with  respect  to 
a  given  semantics  and  to  deal  with  such  an  inconsistent  program. 

Since  the  techniques  to  be  explained  are  general  enough  to  be  applied  to  any 
semantics  of  ndlps  we  will  describe  them  in  terms  of  a  generic  such  semantics 
which  we  call  SEM .  In  addition,  we  will  illustrate  the  application  of  these  tech¬ 
niques  to  the  stable  model  semantics  (covering  in  this  way  the  perfect  model 
semantics),  DWFS  (which  covers  the  WFS  and  the  minimal  models  semantics 
for  disjunctive  logic  programs),  and  the  stationary  semantics. 

The  paper  is  organized  as  follows:  Section  2  introduces  the  notation  and 
definitions  needed  in  the  following  sections.  Section  3  describes  a  standard  pro¬ 
cedure  to  extend  the  model  theoretical  characterization  of  an  arbitrary  semantics 
of  ndlps  to  the  whole  class  of  edlps.  It  includes  also  an  illustration  of  this  tech¬ 
nique  for  the  the  ca'e  of  the  stable  model  semantics.  Section  4  constructs  a 
fixpoint  operator  to  compute  the  extended  version  of  a  semantics  SEM  in  terms 
of  a  fixpoint  operator  which  computes  the  restriction  of  this  semantics  to  ndlps. 
Illustrations  are  given  for  the  DWFS  and  the  stationary  semantics.  Section  5 
describes  a  procedure  to  answer  queries  with  respect  to  edlps  and  an  arbitrary 
semantics  SEM .  This  procedure  uses  ea  a  subroutine,  a  procedure  to  answer 


4 


queries  with  respect  to  the  restriction  of  SEM  to  ndips.  Section  6  studies  the 
complexities  of  some  fundamental  problems  related  to  edlps. 

2  Syntax  and  Definitions 

In  this  section  we  formalize  the  definition  of  extended  disjunctive  logic  programs 
and  introduce  some  of  the  notation  needed  in  the  following  sections. 

An  extended  disjunctive  logic  program,  edlp  ,  is  a  (possibly  infinite)  set  of 
clauses  of  the  form:  li  V  ...  V  /*  ♦—  ft+i,  not  fm+i, .  .,  not  1„,  where  /i, ...,  /n 
are  literals  (i.e.  atoms  and  classically  negated  atoms),  0  <  k  <  m  <  n  and  not 
is  the  negation-by-default  operator. 

Example  1.  The  following  is  an  extended  disjunctive  logic  program: 

P  =  { a Vc  ; 

c  <—  a ,  not  b  ; 

->6  - - >c  ; 

6  <—  e  ,  not  c  ; 

-la  <—  not  a  }  . 

We  assume  the  convention  that  any  occurrence  of  ->->p  is  simplified  to  p. 
Since  a  non-ground  clause  is  equivalent  to  the  set  of  all  its  ground  instances, 
we  consider  here  only  ground  programs  (i.e.  propositional  programs).  This  is 
done  only  to  simplify  the  notation  without  any  loss  of  generality. 

Given  a  program  P,  Lp  denotes  the  set  of  predicate  symbols  that  occur  in  P; 
C  denotes  the  set  of  all  ground  literals  that  can  be  constructed  with  predicates 
in  Lp]  and  li  will  denote  the  Herbrand  universe  associated  with  Lp. 

In  the  context  of  ndips,  DHBp  (resp.  CHBp)  denotes  the  disjunctive  Her¬ 
brand  base  (resp.  conjunctive  Herbrand  base)  of  P,  that  is,  the  set  of  equivalence 
classes  of  disjunctions  (resp.  conjunctions)  of  atoms  appearing  in  P  modulo  log¬ 
ical  equivalence.  This  notion  is  generalized  to  edlps  by  VCp  (resp.  CCp),  the  set 
of  equivalence  classes  of  disjunctions  (resp.  conjunctions)  of  literals  in  C  modulo 
logical  equivalence.^ 

As  noted  before,  extended  programs  enable  us  not  only  to  state  when  a 
predicate  p  holds  but  also  when  -'p  holds.  In  this  sense,  one  can  regard  p  and  -ip 
as  different  predicates  which  happen  to  be  complementary  (i.e.  they  cannot  both 
be  true  or  both  be  false  at  once).  Using  this  idea,  Pearce  and  Wagner  in  [22]  and 
Gelfond  and  Lifschitz  in  [13]  showed  how  to  transform  extended  normal  clauses 
into  normal  clauses.  This  is  done  by  representing  every  negative  literal  ->p  in  a 
program  by  a  new  predicate,  say  p',  with  the  restriction  that  p  and  p'  cannot 
hold  simultaneously.  This  restriction  may  be  viewed  as  an  integrity  constraint. 
Formally,  we  define  the  prime  transformation  I'  of  a  literal  /  to  be: 

_  f  p,  if  /  =  p  for  some  predicate  p 
(  p',  if  /  =  -ip  for  some  predicate  p 

®  For  simplicity,  we  will  write  d  as  an  abbreviation  for  the  equivalence  class  [d]. 


5 


Notice  that  if  we  apply  this  prime  transformation  to  every  literal  occurring 
in  an  edlp  P,  we  obtain  a  normal  disjunctive  logic  program  P'.  The  union  of  P' 
with  the  following  set  of  integrity  constraints  captures  the  same  meaning  of  P: 

ICp'  =  {<=p,p'  :p€  Lp) 

These  integrity  constraints  state  that  p  and  p'  are  in  fact  complementary  predi¬ 
cates.  We  use  here  the  symbol  <=  instead  of  <—  to  emphasize  that  these  integrity 
constraints  are  not  clauses  of  the  program,  i.e.  ICpi  is  not  contained  in  P'. 

In  the  same  spirit,  C  denotes  the  set  of  prime  literals  {/':/£  £}  and  will 
be  taken  as  the  set  of  pr^'d’cate  symbols  appearing  in  P',  i.e.  Lpi  =dej  C . 

Sometimes  we  need  to  recover  program  P  from  P'.  In  order  to  do  so,  we 
define  the  neg  transformation  on  predicates  by: 

p  _  r  p,  if  /  =  p  for  some  predicate  p 
~  ^  ->p,  if  /  =  p'  for  some  predicate  p 

which  is  extended  to  programs  in  the  usual  way. 

It  is  clear  that  for  any  edlp  P,  (P')~'  =  P  and  for  any  ndlp  Q,  (Q’')'  =  Q. 
Also,  it  is  worth  noting  that  the  prime  transformation  le  *  strictly  needed. 
Instead  of  performing  the  prime  transformation,  we  can  treat  -«a  as  if  it  is  an 
atom  independent  of  a.  However,  we  will  use  this  transformation  in  order  to  make 
explicit  when  an  edlp  P  is  thought  of  as  a  normal  disjunctive  logic  program. 

3  Model  Theory  Semantics 

In  this  section  we  describe  a  standard  procedure  to  extend  an  arbitrary  model 
theory  semantics  of  normal  disjunctive  logic  programs  to  the  whole  class  of 
extended  disjunctive  logic  programs.  Since  the  procedure  is  general  enough  to 
be  applied  to  any  semantics  defined  on  the  class  of  ndlps  we  describe  it  in  terms 
of  a  generic  such  semantics  which  we  call  SEM .  In  the  following  subsection  we 
illustrate  the  use  of  the  technique  when  SEM  is  the  stable  model  semantics. 

We  denote  by  interpretation  any  subset  of  the  set  of  literals  C,  and  we  call  an 
interpretation  consistent  only  if  it  does  not  contain  any  pair  of  complementary 
literals,  say  p  and  -'p.  The  prime  and  neg  transformations  of  interpretations  are 
defined  as  expected;  if  M  G  C  then  M'  =  {/':/€  M}  and  if  N  G  C  then 
AT-  =  {r  :  /  6  N). 

Interpretations  which  agree  with  a  given  program  (in  the  sense  of  the  follow¬ 
ing  definition)  are  called  models  of  the  program. 

Definition  1.  Let  P  be  an  edlp  and  let  M  G  C.  Then  M  is  a  model  of  P  iff  M 
is  consistent  and  for  each  program  clause  li  V  ...  V  /*  «—  /t+i,  not  Im+i, 

...,not  l„  in  P,  if  lit+i,  ...,/m  €  M  and  /m+i,...,/n  ^  M  then  3i,  1  <  i  <  k,  such 
that  li  G  M. 

The  following  Lemma  establishes  some  relationships  between  the  models  of 
an  edlp  P  and  the  models  of  P'. 


6 


Lemma  2.  Let  P  be  an  edlp,  Q  be  a  ndlp,  M  be  an  interpretation  of  P,  and  N 
be  an  interpretation  ofQ.  The  following  properties  hold: 

1.  M  is  a  model  of  P  iff  M'  is  a  model  of  P' . 

2.  N  15  a  model  of  Q  iff  N''  is  a  model  of  Q". 

Notice  that  an  inconsistent  edlp  P  (i.e.  a  program  from  which  some  predicate 
p  and  its  complement  ->p  are  both  derivable)  has  no  models.  In  accordance  with 
classical  logic,  every  formula  is  deducible  from  an  inconsistent  set  of  axioms, 
hence,  we  must  declare  the  set  of  all  literals  C  ns  the  meaning  of  an  inconsistent 
program  and  so,  we  must  extend  the  definition  of  a  model. 

Definitions.  Let  P  be  an  edlp  and  let  M  C  C.  Then  M  is  an  extended-model 
of  P  iff  either  A/  =  £,  or  Af  is  a  model  of  P. 

Using  this  definition,  it  is  easy  to  characterize  the  inconsistent  edlps  as  those 
whose  only  extended-model  is  £. 

Let  denote  the  set  of  extended-models  which  characterize  the  seman¬ 

tics  SEM  of  a  logic  program  P.  For  instance,  if  SEM  is  the  stable  model  seman¬ 
tics  then  is  the  set  of  all  stable  extended-models  of  P  (see  Sect.  3.1). 

An  easy  way  to  extend  SEM  to  the  class  of  edlps  is  to  treat  each  literal  in  an 
edlp  program  f*  as  if  it  were  an  atom.  That  is,  the  definition  of  SEM  is  applied 
to  P  as  if  P  were  a  normal  disjunctive  logic  program  containing  some  atoms 
which  happen  to  begin  with  the  character  -i.  In  this  way,  we  obtain  a  set  of 
models  for  P  from  which  we  must  replace  the  ones  that  are  inconsistent  by  C. 

The  main  result  in  this  section  states  that  if  one  uses  the  procedure  just 
described  to  extend  SEM ,  the  set  of  extended-models  which  characterizes  the 
t  .tended  semantics  SEM  of  an  edlp  P  (i.e.  Mp^^)  can  be  obtained  from  the 
set  of  models  characterizing  the  semantics  SEM  of  the  prime  transformation  of 
the  program  (i.e.  Adpf^). 


Theorem  4.  Let  P  be  an  edlp  and  let  M  be  an  interpretation  of  P.  Then  M  € 
^SEM  either 

1.  M  is  consistent  and  M'  €  or 

2.  M  =  C  and  there  is  some  N  €  such  that  N  does  not  satisfy  ICp'. 


Mp^^  constructed  in  this  way  characterizes  the  skeptical  version  of  SEM , 
that  is,  the  version  in  which  a  formula  is  true  w.r.t.  P  if  and  only  if  it  is  true  in 
every  M  €  Mp^^ .  For  the  credulous  version  of  SEM  in  which  a  formula  is  true 
w.r.t.  P  if  and  only  if  it  is  true  in  some  M  €  A4p^^,  we  need  to  reduce 
by  keeping  £  in  Mp^^  only  if  there  is  no  other  model  in  it.  That  is. 


reduced  (Ad 


K} 

Ad^®"  -  {£} 


If  Ad^®"  =  {£}. 
Otherwise. 


On  the  other  hand,  the  definition  of  Mp^^  enables  us  to  distinguish  between 
a  program  that  is  inconsistent  w.r.t.  SEM  ,  i.e.  Mp^^—  {£}  and  a  program 
which  is  incoherent  w.r.t.  SEM  i.e.  Adp®*^=  0. 


3.1  Stable  Model  Semantics 

The  stable  model  semantics  for  normal  logic  programs  was  introduced  by  Gelfond 
and  Lifschitz  in  [11]  and  then  generalized  by  them  to  the  class  of  extended 
logic  programs  (see  [12]).  Przymusinski  [25]  enlarged  this  extended  version  to 
include  disjunctive  programs.  Recently,  Inoue  et  al.  [14]  developed  a  bottom-up 
procedure  to  compute  stable  models  of  edlps. 

Intuitively,  M  C  £  is  a  stable  extended-model  of  an  edlp  P  if  M  is  a  minimal 
extended-model  of  the  disjunctive  program  obtained  from  P  by  interpreting  each 
negation-by-default  of  the  form  not  I,  I  £  C,  according  to  M  in  the  following 
way:  not  I  is  false  if  I  £  M  and  not  I  is  trve  if  /  ^  M.  In  the  first  case,  each 
clause  containing  not  I  in  its  body  may  be  erased  from  P,  and  in  the  second 
case,  any  occurrence  of  not  I  may  be  deleted  from  clauses  in  P  without  altering 
the  meaning  of  P  in  the  context  of  Af.  If  M  is  inconsistent,  it  is  replaced  by  £. 
More  formally: 

Definitions.  Let  P  be  an  edlp  and  let  M  C  C.  The  Gelfond-Lifschitz  transfor¬ 
mation  of  P  w.r.t.  M,  ,  is  defined  as: 

=  {/j  V  ...V  lie  < —  lk  +  \>  I  /l  V  ...V  Ik  *  Ik+lt  ^m>  ^^t  Im+l  j  •••>  ^ot  In 

is  a  clause  in  P  and  {/m+ii  •••.  fn}  H  AT  =  0}. 

Definition  6.  Let  P  be  an  edlp  and  let  M  C  £.  Then  M  is  a  stable  extended- 
model  of  P  iff  either: 

1.  A/  is  consistent  and  Af  is  a  minimal  model  of  P^,  or 

2.  Af  =  £  and  there  is  some  inconsistent  N  Q  C  such  that  N  is  a.  minimal 

model  of  P^. 

Example  S.  The  stable  models  ot  iv  edlp  P  given  in  Example  1  are:  Mi  =  {a,  c} 
and  M2  —  {e,  6,  ->0}.  Notice  that  Mi  is  a  minimal  model  of  P^'  =  {a  V  e  ;  c  <— 

a  ;  -<b  - - >c}  and  Afj  is  a  minimal  model  of  P*^»  =  {a  V  e  ;  ->b  *—  -<e  ;  6  <— 

e  ;  ->0}.  No  other  extended-model  of  P  is  stable. 

The  question  now  is  how  to  effectively  find  all  stable  extended-models  of 
a  given  edlp  P.  At  this  point  we  can  take  advantage  of  the  fact  that  P'  is 
a  normal  disjunctive  logic  program  and  that  there  are  effective  procedures  to 
compute  the  stable  models  for  this  restricted  class  of  programs  (see  [9]).  We  must, 
therefore,  find  a  relationship  between  the  stable  extended- models  of  P  and  the 
stable  models  of  P'.  Such  a  relationship  is  given  by  the  following  theorem  which 
generalizes  a  similar  result  in  [13].  First  we  eeed  a  technical  lemma  stating  that 
the  Gelfond-Lifschitz  transformation  and  tne  pT,i»>e  transformation  commute. 

Lemma  7.  Let  P  be  an  edlp  and  let  M  C  C.  Then  {P^)'  =  P'**  . 

Theorems.  Let  P  be  an  edlp  and  let  M  C  C.  M  is  a  stable  extended-model  of 
Pxff 

1.  M  is  consistent  and  M'  is  a  stable  model  of  P' ,  or 

2.  M  =  C  and  there  is  some  stable  model  of  P'  which  does  not  satisfy  ICp’ . 


8 


4  Fixpoint  Semantics 

In  this  section  we  construct  a  fixpoint  operator  to  compute  the  extended  ver¬ 
sion  of  a  semantics  SEM  in  terms  of  a  fixpoint  operator  which  computes  the 
restriction  of  this  semantics  to  ndlps. 

Let  T  be  a  fixpoint  operator  with  the  property  that  for  every  ndlp  Q,  there 
exists  a  fixpoint  ordinal  qq  such  that  Tq  f  aq  characterizes  the  fixpoint  seman¬ 
tics  of  Q  with  respect  to  SEM .  The  desired  fixpoint  operator  to  compute  the 
semantics  SEM  of  edlps  can  be  described  in  two  different  but  equivalent  ways: 

1 .  As  an  extension  of  T  which  works  with  literals  instead  of  only  positive  atoms. 
As  an  illustration  of  this  approach,  we  present  in  the  following  subsections 
the  extended  fixpoint  operators  for  the  disjunctive  well-founded  semantics 
and  the  stationary  semantics. 

2.  As  an  invocation  to  T.  More  precisely,  given  an  edlp  P,  the  fixpoint  semantics 
of  P  is  obtained  from  the  fixed-point  computed  by  T  for  the  ndlp  P'.  This 
can  be  achieved  by  defining  Tp  ]  a  =*/  (Tp»  |  a)"*  for  all  ordinal  a.  It  is 
easy  to  see  that  if  apt  is  the  minimal  fixpoint  ordinal  of  7p/  then  ap-  is  also 
the  minimal  fixpoint  ordinal  of  T® . 

This  approach  enables  us  to  bound  the  complexity  of  computing  Tp  ]  ap 
as  follows:  complexity(Tp  t  «p)  <  complexity(applying  the  prime  trans¬ 
formation  to  P)  -f  complexity(7p<  t  op')  +  complexity(applying  the  neg 
transformation  to  7p<  t  »P')-  Note  that  the  complexity  of  transforming  P 
to  P'  is  linear  in  the  size  of  P  (for  any  reasonable  definition  of  size). 

The  equivalence  of  these  two  approaches  for  the  DWFS  and  the  stationary 
semantics  will  be  established  in  the  subsections  below. 

In  what  follows,  the  notation  T" ,  where  T  is  a  fixpoint  operator  and  a  is  an 
ordinal,  stands  for  the  composition  of  T  with  itself  a  times.  That  is,  if  A  belongs 
to  the  domain  of  T,  then 

{A  If  Q  =  0. 

T{T°'~^(A))  If  a  is  a  succesor  ordinal. 

U;3<o(^^(^))  If  Is  a  limit  ordinal. 

4.1  Disjunctive  Well-Founded  Semantics 

The  disjunctive  well-founded  semantics  (DWFS)  was  defined  by  Baral  [2]  as 
an  extension  of  the  well-founded  semantics  (WFS)  [32]  to  the  class  of  normal 
disjunctive  logic  programs.  It  is  also  equivalent  to  the  Minker/Rajasekar  fixpoint 
operator  on  the  class  of  disjunctive  logic  programs  [21].  In  the  same  spirit  of 
WFS,  DWFS  is  a  3- valued  semantics  which  associates  to  each  disjunctive  normal 
logic  program  P  a  state-pair  i.e.  a  tuple  S  =<  Ts;Fs  >  where  Ts  is  a  set  of 
disjunctions  of  atoms  {Ts  C  DHBp)  and  Fs  is  a  set  of  conjunctions  of  atoms 
{Fs  C  CHBp)  with  the  closure  properties  that  D  £  Ts,C  £  Fs  and  I  £  C 
then  D  V  I  £  Ts  and  C  A  /  £  Fs-  The  intended  meanings  of  Ts  and  Fs  are 
as  follows:  Ts  contains  all  the  disjunctions  that  are  assumed  to  be  true  in  P 


9 


under  this  semantics  and  Fs  contains  all  the  conjunctions  that  are  aussumed  to 
be  false-by- default  in  P,  i.e.  the  conjunctions  produced  by  the  particular  default 
rule  for  negation  used  by  this  semantics. 

We  describe  the  extension  of  DWFS  to  the  cl-  ss  of  edlps  by  mimicking  the 
definition  of  DWFS  for  the  class  of  ndlps  (see  [16])  bui  working  now  with  literals 
instead  of  only  positive  atoms.  To  do  so,  we  need  the  following  definitions: 

Definition  9.  Let  P  be  an  edlp.  An  extended  state-pair  S  is  a  tuple  <Ts  ,Fs  > 
where  Ts  C  VCp,  Fs  C  CCp  such  that  Tg  is  closed  under  disjunctions  with 
literals  in  C,  that  is,  for  all  literals  I  £  C  if  D  €  Ts  then  D  V  /  6  Ts',  and  Fs  is 
closed  under  conjunctions  with  literals  in  C,  i.e.  for  all  literals  I  £  C  if  C  €  Fs 
then  C  A  /  G  ^5 . 

Notice  that  since,  in  the  extended  framework  with  explicit  negation,  we  are 
allowed  to  define  when  an  atom  p  is  true  as  well  as  when  it  is  false  (i.e.  when  -ip 
is  true),  it  is  not  longer  adequate  to  say  that  a  formula  ip  is  false  with  respect 
to  an  extended  state-pair  5  =<  Ts,Fs  >  if  Fs  \=  <p-  Instead,  we  define: 

Definition  10.  Let  5  =<  Ts;Fs  >  be  an  extended  state-pair  and  let  be  a 
formula.  The  truth  value  ofp  with  respect  to  S  is  given  by:  <p  is  true  if  Ts  )=  <p; 
ip  is  false  if  Ts  |=  -xp',  <p  is  unknown  if  Ts  p  and  Ts  -'p',  and  p  is 
false-hy-default  if  Fs  [=  p. 

We  say  that  an  extended  state-pair  S  is  inconsistent  if  there  exists  a  formula 
p  such  that  Ts  ^p  and  Ts  [=  -^p- 

The  extended  state-pair  associated  with  an  edlp  P  by  DWFS  is  constructed 
as  the  smallest  fixed-point  of  an  operator  defined  on  the  collection  of  extended 
state-pairs.  Given  an  extended  state-pair  S  =<  Ts,Fs  >,  augments  Ts  with 
all  the  disjunctions  that  can  be  deduced  from  P  in  one  or  more  steps  (i.e.  in  one 
or  more  applications  of  an  immediate  consequence  operator  defined  below)  under 
the  assumption  that  the  formulas  in  Ts  are  true  and  the  ones  in  Fs  are  false- 
hy-default.  Similarly,  augments  Fs  with  all  the  conjunctions  which  can  be 
proved  to  be  false-by-default  in  one  or  more  steps  (i.e.  either  the  conjunction 
is  assumed  to  be  false-by-default  or  for  each  rule  in  P  capable  of  deducing  the 
conjunction,  it  is  possible  to  prove  that  its  body  is  false  or  false-by-default) 
supposing  again  that  the  formulas  in  Ts  are  true  and  the  ones  in  Fs  are  false- 
by-default.  Formally: 

Definition  11.  Given  an  edlp  P, 

1.  ike  extended  state-pair  assigned  by  DWFS  to  P  is  the  smallest  fixed-point 
of  the  operator  5®  defined  on  extended  state-pairs  in  such  a  way  that  for 
any  extended  state-pair  S  =<  Ts,  Fs  >: 

5®(5)=<TsU[  U  (T/)'*(0)];F5U[  f]  (Ff  )"(C£p)]  > 

l<n<u;  l<n<w 


where: 


10 


^/'(^)  =  {D  ^  ®jCp|  there  is  a  ground  instance  of  a  clause  D'  *—  /i, . .  .,/m, 
not  Im+i ,  •  •  •  >  ”0/  In  in  P,  where  O'  G  VCp,  such  that  for  all  «,  1  <  i  < 
m,  /j  V  Di  €  Ts UT  for  some  (possibly  null)  D,  G  X>£p,  {/m+i  Jn}  Q 
Fs  and  D'  V  Di  V  . . .  V  Dm  => 

Fg{F)  =  {C  G  C£pl  for  all  ground  instances  of  clauses  of  the  form  A  V 
E  *—  li,..  .,lm,not  Im+i,  ■■  ■  ,not  In  in  P,  where  £  is  a  (possible  null) 
disjunction  of  literals  and  C  =>  A,  at  least  one  of  the  following  cases 
holds: 

(a)  m  >  1  and  /j  A  . . .  A  /m  €  Ps  U  P 

(b)  n  >  m  +  1  and  /m+i  V...V/„  GTs}. 

2.  the  extended  state-pair  characterizing  the  semantics  DWFS  of  P  is  defined 
as  the  extended  state-pair  5  assigned  by  DWFS  to  P  if  5  is  consistent  and 
as  <  VCp',CCp  >  otherwise.  In  the  latter  case,  the  program  P  is  called 
inconsistent  w.r.t.  DWFS. 

In  order  to  simplify  the  notation  in  the  following  examples,  we  use  the  follow¬ 
ing  convention:  If  T  C  VjCp  then  disj-c/osiire^(T)  denotes  the  smallest  subset  of 
VCp  containing  T  that  is  closed  under  disjunctions  with  literals  in  £.  If  P  C  C£p 
then  conj-closure^{F)  denotes  the  smallest  subset  of  CCp  containing  P  that  is 
closed  under  conjunctions  with  literals  in  £. 

Example  3.  The  extended  state-pair  characterizing  the  DWFS  of  the  edlp  given 
in  Example  1  is: 


S  =<  disj-closure£({a  V  e})  •,conj-closure£({-'6,  -<c,  ->€})  >, 


which  is  obtained  from: 

=  disj-closure£({a  V  e})  =  >)*(«>) 

{CCp)  =  conj-closure£({6,-'6,c, ->€,->6}) 

=  conj-closure£({-Th,-.c,->e})=  (P|,  ,j,)3(C£p).  Then, 

5®  (<  ®;  0  >)  =  <  disj-closure£({a  V  e})  ;  conj-closure£ ({->/>,  -<c,  -le})  > 

=  (5®)2(<0;0>). 

Theorem  12.  Lei  P  be  an  edlp  consistent  w.r.t.  DWFS  and  lei  S  =<  Ts,Fs  > 
be  an  extended  state-pair.  Then  S  characterizes  the  semantics  DWFS  of  P  iff 
S'  =<  T's;F'g  >  characterizes  the  semantics  DWFS  of  P'  and  Tg  satisfies  ICp'. 

A  similar  approach  can  be  followed  to  extend  the  generalized  disjunctive 
well-founded  semantics  (GDWFS)  and  WF^  to  the  class  of  edlps. 


*  denotes  logical  implication. 


11 


4.2  Stationary  Semantics 

The  stationary  semantics  introduced  by  Przymusinski  [26]  also  associates  to 
each  ndlp  a  state-pair.  The  construction  of  this  state-pair  (see  [16])  relies  on  the 
notions  of  Extended  Generalized  Closed  World  Assumption  (EGCWA)  [33]  and 
stationary  transformation,  which  we  generalize  to  the  extended  case  as  follows; 

Definition  13.  Let  P  be  an  edlp  and  let  5  be  an  extended  state-pair.  The 
stationary  transformation  of  P  with  respect  to  S,  denoted  by  Sia(P,S),  is  the 
edlp  free  of  negation-by-default®  obtained  from  P  by  . 

1.  removing  each  clause  in  P  whose  body  is  false^  or  false-by- default  with 
respect  to  S. 

2.  removing  the  bodies  of  the  remaining  clauses  in  P. 

An  easy  way  to  enlarge  the  EGCWA  to  the  class  of  edlps  free  of  negation- 
by-default  is  by  using  the  prime  transformation  described  in  Sect.  2. 

Definition  14.  Let  P  be  an  edlp  free  of  negation-by-default.  We  define  the 
EGCWA^iP)  as  (EGCWA(P')r,  i.e. 

EGWA^{P)  =  {not  /i  V  ...  V  not  l„  j  /i, . .  ./„  €  C  and  not  /[  V  . . .  V  no</{,  is 
true  in  every  minimal  model  of  P'}. 

The  extended  state-pair  which  characterizes  the  stationary  semantics  of  an 
edlp  P  is  the  smallest  fixed-point  of  an  operator  defined  on  the  set  of 
extended  state-pairs  in  such  a  way  that  for  any  given  extended  state-pair  S  =< 
Ts',Fs  >,  <5^®  adds  to  Ts  the  set  of  all  the  disjunctions  that  are  logical 
consequences  of  the  union  of  P  with  Ts  and  the  negation  of  the  conjunctions  in 
Fs-  On  the  other  hand,  adds  to  Fs  all  the  conjunctions  that  can  be  assumed 
to  be  false-by-default  under  the  EGCWA^  from  the  stationary  transformation 
of  P  with  respect  to  S  and  Tg^. 

In  what  follows,  not  /i  V . .  .V not  In  is  abbreviated  as  not{li  A . . .  A /r,);  if  C  is 
a  set  of  conjunctions  of  literals  then  not{C)  will  denote  the  set  {not(c)|c  €  C}; 
and  not{not  p)  =  p. 

Definition  15.  Given  an  edlp  P, 

1.  the  extended  state-pair  assigned  by  the  stationary  semantics  to  P  is  the 
smallest  fixed-point  of  the  operator  5*®  defined  on  extended  state-pairs  in 
such  a  way  that  for  any  extended  state-pair  S  =<  Ts\  Fs  >: 

S^^iS)  =<  Ts  U  Fs  U  > 

where: 

T|®  =  {D  6  VCp\P  U  Ts  U  not(Fs)  |=  D}  and 

=  not{EGCWA{Sta{P,  <  T|®  U  Ts;  Fs  >)  U  T|®  U  Ts). 


®  By  this,  we  mean  that  the  operator  not  does  not  appear  in  the  program. 

*  The  truth  value  of  a  formula  with  respect  to  an  extended  state-pair  is  given  in 
Def.  10. 


12 


2.  the  extended  state-pair  characterizing  the  stationary  semantics  of  P  is  de¬ 
fined  as  the  extended  state-pair  S  assigned  by  the  stationary  semantics  to 
P  if  5  is  consistent  and  as  <  VCp^CCp  >  otherwise.  In  the  latter  case,  the 
program  P  is  called  inconsistent  w.r.t.  the  stationary  semantics. 

Example  4-  The  extended  state-pair  which  characterizes  the  stationary  seman¬ 
tics  of  the  edlp  P  given  in  Example  1  is; 

S  =<  disj-closure£({a  V  e})  ;conj-closure£({->6, -^c, ->e,  a  A  e})  >, 

which  is  obtained  from: 

=  disj-closure£({a  V  e}).  Let  us  call  this  set  D. 

^<it>  -  not(EGCWA^{{a  V  c,  c,  -6, 6,  --a}  U  {a  V  e})) 

=  conj-closure£({->c, -'c.aAe}).  Let  us  call  this  set  C.  Then, 
S^^(<  >)  =  <  V-,C>  and 

^<v  c>  —  <iisj-closure£({a  V  e}) 

^<0;C>  =  not(£:GCfV>l®({aVc,c,6,-a}U{aVe})) 

=  conj-closure£({-'6,-'c, -'c.aAe}).  Therefore, 

(5^^)2(<  ^■^>)  =  5^®(<  P;C  >) 

=  <  disj-closure£({o  V  e});  conj-closure£({-«6, -ic, -<6,  a  A  e})  > 

=  (5^®)3(<0,0». 

Theorem  16.  Let  P  be  an  edlp  consistent  w.r.t.  the  stationary  semantics  and  let 
S  =<  Ts;Fs  >  be  an  extended  state-pair.  Then  S  characterizes  the  stationary 
semantics  of  P  iff  S'  =<  Tg  ,F'g  >  characterizes  the  stationary  semantics  of  P' 
and  T'g  satisfies  ICp' . 

5  Proof  Theory 

In  this  section  we  describe  a  procedure  to  answer  queries  w.r.t.  edips  and  an 
arbitrary  semantics  SEM.  This  procedure  uses  as  a  subroutine,  a  procedure  to 
answer  queries  w.r.t.  ndlps  and  SEM.  Notice  that  to  obtain  an  effective  proof 
procedure  we  must  restrict  ourselves  to  work  with  programs  containing  only  a 
finite  number  of  clauses.  However,  this  is  not  a  strong  restriction  since  the  proof 
procedure  described  below  is  capable  of  dealing  with  non-ground  programs. 

We  will  allow  queries  q  of  the  following  sort: 

-  g(X)  =  /(X),  where  /  is  a  literal. 

-  ?(X)  =  not  /(X),  where  /  is  a  literal. 

-  ?(X)  =  9i(X)  A  92(X),  where  qi  and  q2  are  queries. 

-  ^(X)  =  ?i(X)  V  g2(X),  where  qi  and  q2  are  queries. 

where  X  =  Xt, .  ■  .,X„  lists  all  the  free  variables  in  q  which  are  interpreted  as 
being  existentially  quantified. 

We  define  the  correct  answer  to  a  query  as  true,  false,  or  unknown  according 
to  the  following  definition; 


13 


Definition  17.  Given  a  query  q  we  define  the  correct  answer  to  the  query  with 
respect  to  a  consistent  edlp  P  and  semantics  SEM  to  be  the  truth  value  deter¬ 
mined  as  follows: 


1.  Ifq  is  ground  then; 


{true,  for  all  M  €  ,  M  ^  q 

false,  for  all  M  6  Aip^** ,  M  ^  -<q 

unknown,  otherwise. 


2.  If  9  =  9(X),  where  X  =  Xi, . . . ,  lists  all  the  free  variables  in  q,  then: 


answer{P,  q(X.))  = 


’  true,  if  there  exists  a  €  W” 
answer(P,  q(a))  =  true 
<  false,  if  for  all  a  G 
answer(P,q(a))  =  false 
.  unknown,  otherwise. 


s.t. 

W" 


In  the  unlikely  event  that  false-by-default  is  considered  as  an  appropriate 
answer  to  be  given  to  an  user,  it  can  be  defined  as  ansu;er(P,  q)  =  false-by-default 
if  answer(P,  not{q))  =  true,  where  not{q)  stands  for  the  logically  equivalent 
query  to  not  q  in  which  the  operator  not  appears  only  in  front  of  atomic  formulas. 
And  finally,  we  define  what  we  mean  hy  M  \=  q  us  follows: 

Definition  18.  Given  a  model  M  and  a  query  q,  we  define  M  ^  q  as: 

1.  If  q  is  ground  then: 

-  If  g  =  /,  where  /  is  a  literal:  Af  ^  /  iff  /  G  M. 

-  If  g  =  not  I,  where  /  is  a  literal:  M  ^  not  I  iff  I  ^  M. 

-  ir  g  =  gi  A  g2,  where  gi  and  q^  are  queries:  A/  |=  gi  A  gz  iff  M  gi  and 
Af  ^  gz. 

-  If  g  =  gi  V  g2,  where  gi  and  gz  are  queries:  Af  ^  gi  V  gz  iff  A/  ^  gi  or 
M  gz. 

2.  Otherwise  if  g  =  g(X),  where  X  =  Xi,. .  .,X„  lists  all  the  free  variables  in 
g,  then; 

—  Af  ^  g(X)  iff  Af  g(a)  for  every  a  G  U". 

Our  main  purpose  now  is  to  define  a  sound  and  complete  proof  procedure 
EPPseMi  to  answer  queries  w.r.t.  edlps  and  SEM.  To  do  so,  we  assume  we  have 
a  proof  procedure  to  answer  queries  w.r.t.  ndips  and  SEM,  which  we  call  PPsem- 
For  clarity,  we  denote  by  PPsBMiQ,q),  the  answer  of  the  procedure  PPsem  to 
query  g  w.r.t.  a  ndlp  Q  and  by  EPPsEM{P,q),  the  answer  of  the  procedure 
EPPsbm  to  query  g  w.r.t.  an  edlp  P.  Remember  that  an  edlp  (in  contrast  with 
a  ndlp)  can  be  inconsistent.  In  that  case  the  answer  to  any  query  should  be  true. 
To  take  that  into  consideration  we  introduce  the  following  definition. 

Definition  19.  Given  an  edlp  P,  we  define  the  ndlp  P^c 

P^^  =  P'  U  {X  p,p'  I  p  is  a  predicate  symbol  in  Lp), 

where  X  is  a  new  predicate  symbol  not  in  Lp. 


14 


Given  an  edlp  P,  we  can  determine  whether  or  not  this  program  is  inconsis¬ 
tent  w.r.t.  SEM  using  a  sound  and  complete  proof  procedure  PPsem  the 
following  characterization; 

P  is  inconsistent  w.r.t.  SEM  iff  P Psem{P^^ ,  =  true. 

In  this  way,  determining  whether  the  program  is  inconsistent  can  be  done  as 
preprocessing,  previous  to  any  attempt  to  answer  queries. 

Now  we  are  ready  to  describe  the  extended  proof  procedure. 

Definition 20.  Let  PPsem  be  a  proof  procedure  to  answer  queries  w.r.t.  ndlps 
and  SEM.  We  define  an  extension  of  this  procedure,  called  E PPsem,  which 
answers  queries  w.r.t.  edlps  and  SEM  as  follows: 

Given  an  edlp  P  and  a  query  q  do; 

-  If  P  is  inconsistent  w.r.t.  5PA/ then  EPPsEM{P,<i)  —  true. 

-  Otherwise: 

1.  Simultaneously  apply  PPsem  to  the  query  ((9)'  V  X)  and  the  query 
{{-'qY  V  X)^  w.r.t.  program  P^^ . 

2.  If  PPsEMiP'^Aq)'  V  X)  =  true  then  EPPsBM{P,q)  =  true. 

3.  If  PPsEM{P^^,{--qy  V  X)  =  true  then  EPPsem{P,  q)  =  false. 

4.  Otherwise  EPPsEM{P,q)  =  unknown. 

In  can  be  shown  that  EPPsem  is  well  defined  and  gives  the  correct  answer 
(in  the  sense  of  Def.  17).  The  following  example  illustrates  the  need  of  “X”  in 
the  previous  definition. 

Example  5.  Consider  the  edlp  P  =  {a  V  6  ;  a  V  ->6  ;  -la  V  6}.  Hence,  P’  = 
(aVb  ;  aVb'  ;  a'Vb}  and  P^^  =  (aVb  ;  aVb' ;  a'V6;  X  *—  a,  o'  ;  X  »—  6,  6'}.  The  set 
of  stable  models  of  P'^  is  Mpl'c"  =  {{<*.&},  -L},  {6, 6',  X}}  and 

=  {{a,6},jC}.  Notice  that  a  is  true  w.r.t.  P  and  the  stable  model  semantics 
but  a  does  not  hold  in  every  minimal  model  in  since  it  fails  to  hold 

in  {6, 6',  X}.  It  is  clear,  however,  that  a  holds  in  every  consistent  (w.r.t.  ICp,) 
model  in  M'pfc*.  Therefore  (a  V  X)  is  true  w.r.t.  to  and  the  stable  model 
semantics,  hence  by  definition  20,  EPP,taHel.P,a)  =  PP,tabit{P^^ ,{aV  X)  = 
true  as  desired. 

The  complexity  of  EPPsem  can  be  expressed  in  terms  of  the  complexity 
of  PPsem  as  follows;  complexity(£'PP5EAf)  =  2  complexity(PPs£:M)  +  0(n), 
where  n  is  the  size  of  q  and  0(n)  corresponds  to  the  complexity  of  translating  q 
to  q'  and  to  (->?)'. 

Theorem  21.  Let  PPsem  and  EPPsem  ic  as  in  Def.  SO.  The  following  state¬ 
ments  hold: 

By  -<q  we  mean  the  logically  equivalent  query  to  ->q  in  which  -•  appears  only  in  front 
of  atomic  formulas.  It  is  obtained  by  applying  De  Morgan’s  laws  as  needed. 


15 


1.  If  PPsBM  *•*  a  sound  proof  procedure  w.r.t.  normal  disjunctive  logic  programs 
so  IS  EPPsem  w.r.t.  extended  disjunctive  logic  programs. 

2.  If  PPsEM  *s  a  complete  proof  procedure  w.r.t.  normal  disjunctive  logic  pro¬ 
grams  so  IS  EPPsem  w.r.t.  extended  disjunctive  logic  programs. 

This  general  extension  may  be  applied,  for  instance,  to  the  proof  procedure 
developed  by  Fernandez  and  Lobo  [8]  to  answer  queries  w.r.t.  ndlps  and  the 
stable  model  semantics. 


6  Complexity 

The  complexity  of  different  problems  related  to  normal  logic  programs  has  been 
studied  extensively  (see  e  g.  [6], [30]).  In  this  section  we  show  how  to  express  the 
complexities  of  three  fundamental  problems  for  edlps  in  terms  of  the  correspond¬ 
ing  complexities  for  ndlps.  Also,  some  complexity  results  for  the  stable  model 
semantics  are  surveyed  here  and  extended  to  cover  logic  programs  with  classical 
negation. 

In  what  follows,  P  denotes  a  finite  edlp,  and  n  denotes  the  size  of  the  program 
(which  can  be  measured,  for  instance,  as  the  number  of  clauses  in  the  program 
or  as  the  number  of  literals  appearing  in  it).  Also,  a  normal  literal  denotes  either 
a  literal  /  €  £  or  the  negation-by-default,  not  /,  of  such  a  literal. 


Determining  if  a  ground  normal  literal  is  true  in  some  model  in 
.  Let  us  denote  this  problem  by  credulous-truth(SEM,  edlps). 

As  shown  in  Sect.  5,  a  ground  normal  literal  g  is  true  in  some  Af  G 
if  and  only  if  {g'  V  ±)  is  true  in  some  N  G  -Mpf^.  Therefore,  to  determine 
credulous-truth  of  g  in  it  is  enough  to  transform  P  to  and  g  to  g' 

(which  clearly  can  be  done  in  linear  time  on  the  size  of  the  program),  and  then 
checking  credulous-truth  o{g'  and  (if  necessary)  credulous-truth  of  ±  in  Mpf^. 
Then, 


compIexity(credulous-truth(5£'Af ,  ndlps)) 

<  complexity(credulous-truth(5PAf,  edlps)) 

<  2  complexity(credulous-truth(5£M,  ndlps)  +  0(n). 

The  first  inequality  holds  since  any  ndlp  is  also  an  edlp.  As  an  illustration, 
consider  the  stable  model  semantics.  Marek  and  Truszczyfiski  showed  in  [19] 
that  for  propositional  normal  logic  programs  (prop-nips),  credulous-truth(stable, 
prop-nips)  is  NP-complete.  Therefore,  for  propositional  extended  normal  logic 
programs  (prop-enips)  credulous-truth(stable,  prop-enips)  is  also  NP-complete. 


Determining  if  a  ground  normal  literal  is  true  in  every  model  in 
\ip^^ .  Let  us  denote  this  problem  by  skeptical-truth(SEM,  edlps). 

Notice  that  a  ground  normal  literal  g  is  <r«e  in  every  model  in  Mp^^  if  and 
only  if  not  g  does  not  hold  in  any  model  in  .  Hence, 


16 


complexity(skepticaUtruth(5fJA/ ,  edlps)) 

=  complexity(complement(credulous-truth(5£'^,  edlps)). 

Then,  for  the  stable  model  semantics  and  propositional  normal  logic  pro¬ 
grams:  complexity(skeptical-trath(5£JA/,  prop-enlps))  =  co-NP-complete.  which 
extends  the  result  in  [19]  to  extended  logic  programs. 


Determining  if  a  program  has  a  model  w.r.t.  SEM.  Notice  that  by  The¬ 
orem  4,  an  edlp  P  has  a  model  w.r.t.  SEM  if  and  only  if  P'  does.  Therefore: 

complexity(checking  existence  of  a  model  of  P  w.r.t.  SEM) 

=  complexity(checking  existence  of  a  model  of  P'  w.r.t.  SEM)  -f  0(n). 

0(n)  in  the  previous  equation  comes  from  translating  P  to  P' .  For  proposi¬ 
tional  normal  logic  programs,  Marek  and  Truszczyhski  [18]  proved  that  deter¬ 
mining  the  existence  of  stable  models  is  NP-complete,  hence,  for  the  class  of 
propositional  extended  normal  logic  programs  this  problem  is  also  NP-complete. 
Marek,  Nerode  and  Remmel  [17]  showed  that  the  set  of  Godel  numbers  of  finite 
predicate  normal  logic  programs  is  a  i7}-complete  set.  It  is  easy  to  see  that  their 
result  also  holds  for  the  set  of  Godel  numbers  of  finite  predicate  extended  nor¬ 
mal  logic  programs  since  the  prime  transformation  is  recursive  and  indeed  can 
be  performed  in  polynomial  time. 

7  Conclusions 

Two  criticisms  have  been  made  about  the  use  of  classical  negation  in  logic  pro¬ 
gramming.  The  first  one  states  that  there  may  be  cases  in  which  it  is  not  feeisible 
to  include  every  piece  of  negative  information  in  the  domain  of  the  problem.  In 
this  regard  we  note  that  Gelfond  and  Lifschitz  [12]  have  pointed  out  that  when 
the  positive  information  about  some  predicate  p  is  complete  in  a  program  (that 
is,  all  conditions  under  which  p  is  true  are  given  in  the  program),  it  is  enough  to 
define  -ip  as  “->p  <—  not  p”.  This  remark  also  justifies,  to  some  extent,  keeping 
negation-by-default  in  extended  logic  programs. 

The  second  criticism  addresses  the  potential  inconsistency  of  extended  logic 
programs.  In  this  respect  we  have  described  here  concrete  mechanisms  to  deal 
with  inconsistent  logic  programs. 

These  difficulties  seem  to  be  a  small  price  to  pay  for  the  increased  expressive 
power  gained  by  the  explicit  use  of  classical  negation  in  logic  programs. 

8  Acknowledgements 

Support  for  this  paper  was  provided  by  the  Air  Force  Office  of  Scientific  Research 
under  grant  number  91-0350  and  the  National  Science  Foundation  under  grant 
number  IRI-8916059. 


17 


References 

1.  J.J.  Alferes  and  L.M.  Pereira.  On  logic  program  semantics  with  two  kinds  of 
negation.  In  K.  Apt,  editor,  Proceedings  of  the  Joint  International  Conference  and 
Symposium  on  Logic  Programming,  pages  574-588,  Washington,  D.C.  USA,  Nov 
1992.  The  MIT  Press. 

2.  C.  Baral.  Issues  in  Know' edge  Representation:  Semantics  and  Knowledge  Com¬ 
bination.  PhD  thesis,  University  of  Maryland,  College  Park,  MD.  20742  USA, 
1991. 

3.  C.  Baral,  J.  Lobo,  and  J.  Minker.  Generalized  disjunctive  well-founded  semantics: 
Declarative  semantics.  In  Proceedings  of  the  Fifth  International  Symposium  on 
Methodologies  for  Intelligent  Systems,  pages  465-473,  Knoxville  TN,  USA,  1990. 

4.  C.  Baral,  J.  Lobo,  and  J.  Minker.  Generalized  disjunctive  well-founded  semantics: 
Procedural  semantics.  In  Proceedings  of  the  Fifth  International  Symposium  on 
Methodologies  for  Intelligent  Systems,  pages  456-464,  Knoxville  TN,  USA,  1990. 

5.  C.  Baral,  J.  Lobo,  and  J.  Minker.  WF^:  A  semantics  for  negation  in  normal  dis¬ 
junctive  logic  programs.  In  Proceedings  of  the  Sixth  International  Symposium  on 
Methodologies  for  Intelligent  Systems,  pages  459-468,  Charlotte  NC,  USA,  1991. 

6.  M.  Cadoni  and  M.  Schaerf.  A  survey  on  complexity  results  for  non-monotonic 
logics.  Preprint,  Universitadi  Roma  “La  Sapienza”,  via  Salaria  113,  00198  Roma, 
Italy,  1992. 

7.  K.L.  Clark.  Negation  as  failure.  In  H.  Gallaire  and  J.  Minker,  editors,  Logic  and 
Data  Bases,  ;  iges  293-322.  Plenum,  New  York,  USA,  1978. 

8.  J.A.  Fernandez  and  J.  Lobo.  A  proof  procedure  for  stable  theories.  Technical 
Report  CS-TR-3034,  UMIACS-TR-93-14,  University  of  Maryland,  College  Park, 
MD  20742  USA,  1993. 

9.  J.A.  Fernandez,  J.  Lobo,  J.  Minker,  and  V.S.  Subrahmanian.  Disjunctive  Ip  -f  in¬ 
tegrity  constrains  =  stable  model  semantics.  Annals  of  Mathematics  and  Artificial 
Inteligence,  8(3-4),  1993. 

10.  M.  Gelfond.  On  stratified  autoepistemic  theories.  In  Proceedings  of  AAAI-81, 
pages  207-211,  1987. 

11.  M.  Gelfond  and  V.  Lifschitz.  The  stable  model  semantics  for  logic  programming. 
In  R.  Kowalski  and  K.  Bowen,  editors,  Procedings  of  the  Fifth  International  Con¬ 
ference  and  Symposium  on  Logic  Programming,  pages  1070-1080,  Seattle,  WA. 
USA,  Aug.  1988.  The  MIT  Press. 

12.  M.  Gelfond  and  V.  Lifschitz.  Logic  programs  with  classical  negation.  In  D.H.D. 
Warren  and  P.  Szetedi,  editors.  Proceedings  of  the  Seventh  Intemationcd  Confer¬ 
ence  on  Logic  Programming,  pages  579-597,  Jerusalem,  Israel,  June  1990.  The  MIT 
Press. 

13.  M.  Gelfond  and  V.  Lifschitz.  Classical  negation  in  logic  programs  and  disjunctive 
databases.  New  Generation  Computing,  9:365-385,  1991. 

14.  K.  Inoue,  M.  Koshimura,  and  R.  Hasegawa.  Embedding  negation  as  failure 
into  a  model  generation  theorem  prover.  In  D.  Kapur,  editor.  Proceedings  of 
the  Eleventh  International  Conference  on  Automated  Deduction,  pages  400-415, 
Saratoga  Springs  NY,  USA,  June  1992.  Springer- Verlag. 

15.  J.W.  Lloyd.  Foundations  of  Logic  Programming.  Springer- Verlag,  second  extended 
edition,  1987. 

16.  J.  Lobo,  J.  Minker,  and  A.  Rajasekar.  Foundations  of  Disjunctive  Logic  Program¬ 
ming.  The  MIT  Press,  1992. 


18 


17.  V.  W.  Marek,  A.  Nerode,  and  J.B.  Remmel.  The  stable  modds  of  a  predicate  logic 
program.  In  K.  Apt,  editor,  Proceedings  of  the  Joint  International  Conference  and 
Symposium  on  Logic  Programming,  pages  446-460,  Washington,  USA,  Nov  1992. 
The  MIT  Press. 

18.  V.  W.  Marek  and  Truszczynski.  Autoepistemic  logic.  Journal  of  the  ACM, 
38(3):588-619,  1991. 

19.  V.  W.  Marek  and  Truszczyiiski.  Computing  intersection  of  autoepistemic  expan¬ 
sions.  In  Proceedings  of  the  Fifth  International  Workshop  on  Logic  Programming 
and  Non-monotonic  Reasoning,  pages  37-50.  The  MIT  Press,  1991. 

20.  J.  Minker.  On  indefinite  databases  and  the  closed  world  assumption.  In  Proceed¬ 
ings  of  the  Sixth  Conference  on  Automated  Deduction,  pages  292-308,  1982. 

21.  J.  Minker  and  A.  Rajasekar.  A  fixpoint  semantics  for  disjunctive  logic  programs. 
Journal  of  Logic  Programming,  9(l);45-74,  July  1990. 

22.  P.  Pearce  and  G.  Wagner.  Logic  programming  with  strong  negation.  In 
P.  Schroeder-Heister,  editor.  Proceedings  of  the  International  Workshop  on  Exten¬ 
sions  of  Logic  Programming,  pages  311-326,  Tubingen,  FRG,  Dec.  1989.  Lecture 
Notes  in  Artificial  Intelligence,  Springer  -Verlag. 

23.  T.  C.  Przymusinski.  Stable  semantics  for  disjunctive  programs.  New  Generation 
Computing,  9:401-424,  1991. 

24.  T.C.  Przymusinski.  Perfect  model  semantics.  In  R.  Kowalski  and  K.  Bowen,  ed¬ 
itors,  Proceedings  of  the  Fifth  International  Conference  and  Symposium  on  Logic 
Programming,  pages  1081-1096,  Seattle,  WA.  USA,  Aug.  1988.  The  MIT  Press. 

25.  T.C.  Przymusinski.  Extended  stable  semantics  for  normal  and  disjunctive  pro¬ 
grams.  In  D.H.D.  Warren  and  P.  Szeredi,  editors.  Proceedings  of  the  Seventh  In¬ 
ternational  Conference  on  Logic  Programming,  pages  459-477,  Jerusalem,  Israel, 
June  1990.  The  MIT  Press. 

26.  T.C.  Przymusinski.  Stationary  semantics  for  disjunctive  logic  programs  and  de¬ 
ductive  databases.  In  S.  Debray  and  M.  Hermenegildo,  editors.  Proceedings  of 
the  North  American  Conference  on  Logic  Programming,  pages  42-59,  Austin,  TX. 
USA,  Oct.  1990.  The  MIT  Press. 

27.  A.  Rajasekar,  J.  Lobo,  and  J.  Minker.  Weak  generalized  closed  world  assumption. 
Automated  Reasoning,  5:293-307,  1989. 

28.  R.  Reiter.  On  closed  world  data  bases.  In  H.  Gallaire  and  J.  Minker,  editors.  Logic 
and  Data  Bases,  pagee  55-76.  Plenum,  New  York,  1978. 

29.  K.A.  Ross  and  R.W.  Topor.  Inferring  negative  information  from  disjunctive 
databases.  Journal  of  Automated  Reasoning,  A(2):Z97 -A2A,  Dec.  1988. 

30.  J.S.  Schlipf.  A  survey  of  complexity  and  undecidability  results  in  logic  program¬ 
ming.  In  H.  Blair,  V.W.  Marek,  A.  Nerode,  and  J.  Remmel,  editors,  Informal  Pro¬ 
ceedings  of  the  Worshop  on  Structural  Complexity  and  Recursion-theoretic  Methods 
in  Logic  Programming.,  pages  143-164,  Washington,  D.C.  USA,  Nov,  1992. 

31.  M.H.  van  Emden  and  R.A.  Kowalski.  The  semantics  of  predicate  logic  as  a  pro¬ 
gramming  language.  Journal  of  the  ACM,  23(4):733-742,  1976. 

32.  A.  van  Gelder,  K.A.  Ross,  and  J.S.  Schlipf.  Unfounded  sets  and  well-founded  se¬ 
mantics  for  general  logic  programs.  In  Proceedings  of  the  Seventh  ACM  Symposium 
on  Principles  of  Database  Systems.,  pages  221-230,  1988. 

33.  A.  Yahya  and  L.J.  Henschen.  Deduction  in  non-Horn  databases.  Journal  of  Auto¬ 
mated  Reasoning,  1(2):141-'160,  1985. 


L 


Model  Finding  Strategies  in 
Semantically  Guided  Instance-based  Theorem 

Proving* 


Heng  Chu  and  David  A.  Plaisted 


Department  of  Computer  Science 
University  of  North  Carolina 
Chapel  HUl,  NC  27599-3175,  USA 
Email:  {chu|plaisted}©cs.unc.edu 
Phone  numbers:  919-962-{l733|l751} 


Abstract.  Semantic  hyper-linking  has  recently  been  proposed  [1]  to  use 
semantics  in  an  instance-based  theorem  prover.  The  basic  procedure  is 
to  generate  ground  instances  of  the  input  clauses  until  the  ground  clause 
set  is  unsatishable.  Models  for  the  satishable  ground  set  are  constructed 
periodically  to  guide  generation  of  the  new  ground  instances  to  change 
the  models,  until  no  model  can  be  constructed.  In  this  paper  we  discuss 
some  model  finding  strategies  that  can  generate  useful  ground  instances, 
without  using  semantics,  to  change  the  ground  modeb.  We  show  that 
such  strategies  are  helpful  and  will  not  increase  the  search  space  of  the 
semantic  hyper-linking.  In  addition,  using  semantics  is  often  expensive. 
Since  semantics  is  not  used  in  those  model  finding  strategies,  they  help 
to  find  the  proofs  earlier  and  faster. 


1  Introduction 

Instance-based  theorem  proving  is  a  direct  application  of  Herbrand’s  theorem  [4]. 
The  basic  idea  is  to  generate  ground  instances  of  the  input  clauses  and  check  the 
satisfiability  of  the  ground  clause  set  using  a  propositional  calculus  (PC)prover. 
Early  results  [3]  were  not  impressive.  Lee  and  Plaisted  [6]  recently  developed 
a  fast  instance-based  theorem  proving  procedure,  hyper-linking,  to  generate  in¬ 
stances  (not  necessarily  ground)  of  the  input  clauses  using  unification.  For  a 
clause  C,  an  instance  C6  is  generated  if  for  each  literal  L,  of  C,  there  is  one  lit¬ 
eral  Ri  in  some  other  clause  such  that  Li6  =  '^RiO,  where  9  is  the  most  general 
such  substitution.  We  say  Li  and  Ri  are  hyper-linked.  Those  instances  are  then 
grounded  (replacing  all  variables  by  a  constant)  and  submitted  to  a  PC  prover 
to  check  satisfiability.  Since,  unlike  resolution,  literals  from  different  clauses  are 
not  combined,  hyper-linking  performs  well  on  non-Horn  problems.  Results  [5,  6] 
show  that  hyper-linking  is  a  useful  theorem  proving  technique. 

*  This  research  was  partially  supported  by  the  National  Science  Foundation  under 
grant  CCR-9108904 


20 


Chu  and  Plaisted  [1]  developed  semantic  hyper-hnktng  to  use  semantics  with 
hyper-linking.  Fast  proofs  on  hard  theorems  like  IMV  and  AM8,  which  ordinary 
hyper-linking  cannot  prove  due  to  large  search  space,  show  that  semantic  hyper- 
linking  provides  a  practical  method  to  use  semantics  in  theorem  proving. 

Semantic  hyper-linking  is  different  from  hyper-linking.  In  semantic  hyper¬ 
linking,  models  found  by  the  PC  prover  for  the  ground  clause  sets  are  used 
to  guide  the  next  semantic  hyper-linking  to  generate  more  ground  instances  of 
the  input  clauses  until  the  ground  clause  set  is  unsatishable.  Since  semantic 
hyper-linking  uses  user-provided  semantics  and  often  semantic  checks  are  time 
consuming,  we  want  to  check  that,  without  using  semantics,  the  models  found 
for  the  ground  clause  set  cause  no  contradictions  with  the  (non-ground)  input 
clauses.  However,  this  problem  is  undecidable  in  general. 

In  this  paper,  we  discuss  the  model  finding  strategies,  called  incremental 
model  finding,  used  in  semantic  hyper-linking  to  search  for  useful  models  of  the 
ground  clause  set.  In  the  next  two  sections,  we  will  describe  briefly  the  PC  prover 
and  semantic  hyper-linking.  Then  we  will  discuss  the  incremental  model  finding 
strategies  in  detail.  In  the  last  section  to  conclude  the  paper,  we  discuss  the 
results  using  the  model  finding  strategies  to  prove  two  hard  theorems. 


1.1  Propositional  Calculus  (PC)  Prover 


The  PC  prover  [5]  used  in  hyper-linking  is  a  modified  Davis-Putnam  proce¬ 
dure  [2].  The  basic  step  is  to  apply  (recursively)  case  analysis  and  simplification 
on  the  ground  clause  set.  Simplification  rules  are  described  below,  where  5  is  a 
conjunction  of  clauses  and  /I  is  a  disjunction  of  literals. 


S  A  (A  V  TRUE) 


(Unit  Deletion) 


Sr.  {A  V  FALSE) 
S  A  A 


{Unit  Simplification) 


SATRUE 

S 

S  A  FALSE 
FALSE 


Backtracking,  either  to  another  case  or  previous  case  analysis,  occurs  when 
simplification  gives  a  FALSE  (contradiction).  When  5  is  TRUE,  a  model  is  con¬ 
structed  to  be  the  set  of  literals  chosen  for  ceise  analysis. 

The  PC  prover  procedure  is  a  preorder  traversal  of  a  case  tree  [5].  For  a 
ground  clause  set  5,  let  A  be  the  set  of  distinct  atoms  occurring  in  S.  A  case 
tree  is  a  binary  tree  T  in  which  each  edge  is  an  atom  from  A  or  its  negation, 
each  non-terminal  node  contains  two  subtrees  linked  by  two  edges  attached  with 
L  and  ~L  respectively  and  L  £  A,  and  any  path  from  the  root  contains  no 
complementary  pairs.  Each  node  of  a  case  tree  can  be  seen  as  containing  two 
cases:  TRUE  and  FALSE  of  a  literal  L.  The  ceise  tree  is  not  unique  due  to  the 
different  orderings  to  choose  literals  for  case  analysis. 


21 


1.2  Semantic  Hyper'Linking 

In  this  section  we  briefly  describe  semantic  hyper-linking.  More  detailed  discus¬ 
sions  are  in  [1]. 

Semantic  hyper-linking  is  a  refutation  procedure  employing  semantics  in  an 
instance-based  theorem  prover.  A  semantic  structure  Sm  is  given  as  input  with 
the  axioms  and  the  negation  of  the  theorem.  The  structure  contains  the  domain 
and  interpretation  for  the  theorem.  In  general,  a  structure  S  can  be  seen  as  a 
(probably  infinite)  set  of  all  ground  literals  true  in  S.  If  the  input  clause  set 
is  unsatisfiable,  we  can  generate  some  ground  instances  that  are  false  in  Sm- 
We  then  find  a  model  Mi  for  those  ground  instances.  There  must  be  some 
literals  true  in  M\  but  false  in  we  call  these  literals  the  eligible  literals. 
Sm  and  Mi  constitute  a  new  structure  Si  which  is  the  same  as  Sm  except  for 
the  eligible  literals.  Eligible  literals  are  then  used  as  guidance  to  generate  new 
ground  instances  tnat  are  false  in  .  A  new  model  M^  is  found  for  the  new  set 
of  ground  clauses.  Literals  true  in  A/j  but  false  in  Si  are  the  new  eligible  literals. 
M2  and  Sm  constitute  another  new  structure  .52.  We  exhaustively  search  for 
models  M3,  M4,  ...  in  this  fashion  until  the  ground  clause  set  is  unsatisfiable 
and  a  proof  is  found.  Each  Mi  and  Sm  constitute  a  new  structure  5*;  semantic 
hyper-linking  generates  ground  instances  false  in  Si.  The  process  of  generating 
ground  instances  from  Si  is  called  a  round.  The  input  semantic  structure  Sm  is 
used  to  filter  out  unwanted  instances  and  prune  the  search  space  dramatically. 

When  a  model  Mi  is  found  for  the  ground  clause  set,  sometimes  ground 
instances  can  be  derived  from  the  input  (non-ground)  clauses  to  contradict  a 
model  Mi  and  obtain  a  new  model  A/j+i-  Such  changes  to  A/,  are  syntactic  and 
generate  ground  instances  without  using  semantics;  they  are  thus  feister  since 
using  semantics  usually  is  expensive.  In  the  next  section  we  will  discuss  some 
strategies  used  to  contradict  a  model  for  the  ground  clauses  by  generating  ground 
instances  or  ground  logical  consequences  of  the  input  clauses. 


2  Model  Finding 


In  searching  for  a  model  for  the  ground  clause  set,  we  use  a  method  called 
incremental  model  finding.  The  old  model  is  kept  and  we  search  for  a  new  model 
based  on  the  old  one.  The  idea  is:  We  continue  the  PC  prover  from  the  node  in 
the  case  tree  where  the  old  model  is  found  last  time.  New  models  are  constructed 
incrementally  in  such  a  way  that  no  old  model  ever  becomes  a  subset  of  a  new 
model,  and  the  same  path  from  the  root  of  the  case  tree  is  never  visited  twice. 
The  model  finding  is  incremental  because  new  ground  clauses  might  gradually 
“expand”  the  case  tree  (which  is  usually  infinite). 

During  the  search  for  a  model  of  the  ground  clause  set,  three  different  meth¬ 
ods  are  applied  to  detect  and  discard  models  causing  contradictions;  model  filter¬ 
ing,  model  literal  replacement  and  UR  resolution.  Those  methods  check  if  a  model 
contradicts  the  input  clauses.  In  general,  however,  such  task  is  undecidable. 


22 


2.1  Incremental  Model  Finding 

The  incremental  model  finding  method  is  a  modified  PC  prover.  One  major 
difference  is  that  it  starts  with  an  old  model  found  earlier  and  builds  a  new 
model  brised  on  the  old  one;  the  PC  prover  in  Lee’s  prover  stops  as  soon  as 
the  clause  set  is  found  to  be  satisfiable.  Another  difference  is  that  we  maintain 
only  one  case  tree  (possibly  infinite)  throughout  the  whole  search  of  proof.  In 
each  incremental  model  finding,  backtracking  takes  place  on  failure  nodes  (which 
contain  FALSE);  subtrees  of  failure  nodes  are  discarded;  paths  containing  failure 
nodes  are  never  visited  again;  and  the  case  tree  is  expanded  when  necessary. 

Like  the  PC  prover,  model  finding  can  be  thought  of  as  a  preorder  traversal 
of  the  case  tree  starting  from  the  root.  The  two  sub-cases  of  each  node  represent 
the  case  analysis  on  one  literal:  one  is  true  in  the  user-provided  semantics,  the 
other  is  false.  To  search  for  a  new  model  when  new  ground  clauses  are  generated, 
we  continue  from  the  node  where  the  previous  model  is  found.  Additionally,  in 
case  analysis  the  true  case  is  always  chosen  first.  This  is  a  heuristic  method  to 
reduce  the  number  of  eligible  literals  in  the  model. 

For  an  example  about  how  incremental  model  finding  works,  consider  the 
case  tree  [1]  in  Fig.  1  for  a  set  5  of  ground  clauses,  x  indicates  a  failure  node; 
the  model  Mi  =  {  s(b),  ~s(i(6)),  ~s(e),  p(e,  i(b),  i(b)) }  is  found  at  node  Ni .  With 
Ml,  suppose  semantic  hyper-linking  generates  a  new  clause  C  =  {s(e),~s(6), 
~s(6), ~p(6,  t(6), e)}.  If  we  continue  the  model  finding  at  node  Ni  after  C  is 
added  which  contradicts  Mi,  we  can  get  the  case  tree  in  Fig.  2.  Note  that  at 
node  N2,  a  new  model  Mj  =  Mi  U  {~^p{b,t{b),e)}  is  found.  The  case  tree  is 
“expanded”  in  Fig.  2  to  include  case  analysis  on  the  new  literal  p{b,  i(b),e). 


2.2  Model  Filtering 

After  a  model  is  found  for  the  ground  clause  set,  it  might  directly  contradict 
the  input  clauses,  namely,  there  might  be  ground  instances  (not  yet  generated) 
of  the  input  clauses  which  falsify  the  model.  We  use  model  filtering  to  filter  out 
such  models  and  make  sure  that  the  model  we  found  satisfies  the  input  clauses 


23 


also.  The  filtering  idea  is  simple:  We  use  the  input  clauses  to  hyper-link  with 
the  literals  in  the  model.  If  one  input  clause  C  is  fully  hyper-linked  with  the 
model  literals  to  get  CB  where  0  is  a  ground  substitution  (because  all  literals 
in  the  model  are  ground),  the  model  contradicts  CO.  We  then  add  CO,  which 
is  a  ground  instance  of  an  input  clause  C,  in  the  ground  clause  set  to  avoid 
generating  the  models  causing  the  same  contradiction. 

For  e.xample,  suppose  we  obtain  a  model  {p{a),q(b),-^r(a,b), . . .}.  Suppose 
there  exists  an  input  clause  {  ~p(X),  ~5(T),  r(>V,  F)  }.  The  model  contradicts 
this  clause  and  thus  should  not  be  kept;  the  model  finding  should  backtrack 
to  find  another  model.  The  ground  clause  {  ~p(a),  ~9(6),  »’(a,  6) }  is  added  to 
the  ground  clause  set  to  avoid  generating  any  other  model  containing  literals 
p{a),q{b),  and  ~r(a,6). 

If  more  than  one  input  clause  is  contradicted  by  the  model,  we  only  add  the 
smallest  ground  instance  of  those  to  the  ground  clause  set.  In  general  it  is  not 
necessary  to  add  all  contradicted  instances  because  the  model  will  be  changed 
and  probably  remove  all  other  contradictions.  Another  reason  is  that  our  model 
finding  is  feist  and  contradicted  ground  instances  will  be  added  eventually  if  they 
still  cause  contradictions  in  the  new  models. 

In  fact  those  contradicted  ground  instances  can  be  found  in  later  rounds  of 
semantic  hyper-linking  if  model  filtering  is  not  used.  However,  model  filtering 
detects  the  contradictions  as  early  as  possible  to  save  some  rounds  of  semantic 
hyper-linking  and  find  the  proofs  sooner. 

Manthey  and  Bry  [7]  used  the  same  technique  to  check  a  model  in  their 
model-generation  theorem  prover  SATCHMO.  They  used  a  different  way  to 
generate  grt.und  literals,  which  constitute  a  model  for  the  input  clauses.  In 
SATCHMO  -he  contradicted  input  clause  instances  are  not  saved;  the  contra¬ 
diction  only  causes  backtracking  in  the  search  for  a  proof. 


24 


2.3  Model  Literal  Replacement 

Lee  and  Plaisted  [5,  6]  use  predicate  replacement  to  generate  input  clauses  in¬ 
stances  faster  by  partially  hyper-linking  literals  in  a  clause.  Replace  rules  are 
constructed  from  input  clauses  to  apply  predicate  replacement.  A  replace  rule  is 
a  clause 


{  ~Ci ,  ~C'2,  . . . ,  ~Cn,  Li,  Li,  . . . ,  Lm  } 

in  the  format  of 

Ci,  1  <  *  <  >4,  and  Lj  ,  \  <  j  <  m  could  be  positive  or  negative  literals.  Literals, 
~Ci,  ~C2, . .  . ,  ~C„,  to  the  left  of  the  arrow  are  distinguished  literals. 

In  predicate  replacement,  only  distinguished  literals  in  a  replace  rule  are 
hyper-linked  with  literals  from  other  clauses.  This  provides  a  faster  way  to  gener¬ 
ate  instances  using  hyper-linking.  Users  can  repeatedly  apply  predicate  replace¬ 
ment  indefinitely  or  just  once.  Predicate  replacement  has  been  used  to  prove 
many  hard  theorems  in  very  short  time  (see  [5]).  It  indicates  that  predicate  re¬ 
placement  has  a  lot  of  potential  in  general  theorem  proving.  However,  often  it 
requires  deep  understanding  of  the  theorems  to  devise  the  replace  rules.  As  a 
result,  the  replace  rules  are  not  natural  and  sometimes  it  is  not  easy  to  devise 
appropriate  replace  rules  to  obtain  the  proof  fast. 

We  use  a  simplified  predicate  replacement;  we  only  apply  the  replacement 
using  the  model  literals  and  the  idea  is  slightly  different.  The  purpose  is,  instead 
of  generating  instances,  to  again  check  if  the  model  causes  any  contradiction 
through  predicate  replacement. 

The  replace  rules  we  use  are  natural  replace  rules  and  can  be  automatically 
generated  from  the  input  clauses.  Thus  human  intervention  is  not  needed.  A 
replace  rule  is  natural  if 

1.  it  has  a  unit  consequence,  namely,  it  has  the  format 

Ci,C2,...,C„  —  L 

where  L  and  each  C;  are  literals. 

2.  each  variable  in  L  occurs  in  some  Cj. 

Once  all  C,  in  a  natural  replace  rule  are  hyper-linked  with  ground  literals, 
L  becomes  ground  after  a  proper  substitution  is  applied.  If  every  C*  is  linked 
with  a  literal  from  a  unit  clause,  we  obtain  a  unit  clause  {L6}  because  all  Ci 
are  (unit)  deleted.  In  this  case,  we  can  think  of  the  replacement  as  deriving  a 
unit  consequence  L6  from  single  facts  Ci0.  We  use  input  clauses  to  generate  all 
possible  natural  replace  rules. 

Here  is  how  to  use  natural  replace  rules;  we  think  of  the  literals  in  the  model 
eis  literals  from  (ground)  unit  clauses  (called  unit  literals)  which  can  be  seen  as 
single  facts.  We  then  use  the  natural  replace  rules  to  generate  more  unit  literals 
which  can  be  seen  as  unit  consequences  or  derived  single  facts.  New  unit  liter¬ 
als  are  checked  if  there  is  any  contradiction  with  old  ones.  We  can  repeat  the 


25 


replacement  more  than  once,  just  like  the  repeated  replace  rules  used  in  [6,  5]. 
However,  repeated  natural  replacement  may  not  terminate.  For  example,  the 
natural  replace  rule  p(X)  — >  p(f(X))  might  keep  generating  larger  and  larger 
p{f{f{-  ■  •)))  literal  and  will  not  terminate.  A  bound  on  the  number  of  replace¬ 
ments  or  the  number  of  the  derived  unit  literals  should  be  used  to  guarantee 
termination.  The  natural  replacement  procedure  is  described  in  Fi.’  3. 


Algorithm  NaturaLPredicate.Replacement{M,  Replace,  Bound,  S) 

Input 

Af:  set  of  literals  in  the  model 
Replace:  set  of  natural  replace  rules 

Bound:  bound  on  the  number  of  replacement  or  the  derived  literals 
Output 

S:  set  of  literals  in  A/,  that  cause  contradiction 
Return  Values 

FALSE:  if  no  contradiction  is  found 
TRUE:  if  a  contradiction  is  found 

begin 

{5unii  is  a  set  of  unit  literals} 

S„n%t  •— 

while  Bound  is  not  exceeded  do 

for  each  natural  replace  rule  R  =  Ci , . . . ,  Cn  — >  L  in  Replace  do 
for  each  L  obtained  by  hyper-linking  all  C,  with  literals  in  Sun.i  do 
if  L  contradicts  any  literal  in  5u„ii  then 
S  :=  { literals  in  Af  that  derive  L  and  } 
return  TRUE 
else 

Suntt  •—  S„nit  ^  {  f*  } 
return  FALSE 
end 


Fig.  3.  Algorithm: NaturaLPredicate.Replacement 

For  each  derived  literal  L,  we  maintain  a  set  of  model  literals  which  generate 
L  using  natural  replacement.  A  contradiction  is  found  if  a  newly  derived  literal  L 
contradicts  an  existing  literal  ~L.  Suppose  the  L  depends  on  the  set  Si  of  model 
literals;  ~L  depends  on  the  set  S^i.  The  algorithm  returns  5  =  SiiJSsL-  Then 
we  add  a  clause  C  =  {  Af  :  ~Af  G  S  }  to  the  ground  clause  set.  It  is  justified  to 
add  C  as  a  ground  clause  because  C  is  a  logical  consequence  of  the  input  clauses. 
Any  further  contradiction  by  C  would  also  contradict  the  input  clauses.  Thus  C 
can  be  added  to  the  ground  clause  set,  and  we  not  only  change  the  model  but 
also  avoid  generating  any  model  later  that  would  cause  the  same  contradiction. 

Let  us  look  at  a  simple  example.  Suppose  model  literals  L\  and  L2  generate 
L  using  the  natural  replace  rule  Li,  1 2  — ►  L,  and  L3  and  L4  generate  ~L  using 


26 


the  natural  replace  rule  L3,  La  — >  ~i-  Then  L  depends  on  the  model  literals 
{  Z-i ,  Lt  }  =  Sl,  and  ~L  depends  on  {  L3,  La)  =  Ssl  ■  Since  L  and  ~L  cause 
a  contradiction,  the  algorithm  returns  a  set  5  =  U  Ssl  =  {  ^1 .  L2,  L3,  La  } 
We  then  add  a  clause  C  =  {  ~Li,~L2i~i'3,~i4  )  to  the  ground  clause  set.  To 
illustrate  (informally)  that  C  is  a  logical  consequence  of  the  input  clauses,  we 
can  think  of  6"  as  a  resolvent  from  resolving  the  two  replace  rules  on  L  and  ~L. 
Since  the  two  replace  rules  are  actually  input  clauses,  it  easily  follows  that  C  is 
a  logical  consequence  of  the  input  clauses.  C  is  added  to  the  ground  clause  set 
to  avoid  generating  any  model  that  contains  the  subset  {  Li,  L^,  L3,  La  }• 


2.4  UR  Resolution 


Unit  resolution  [9]  is  a  resolution  involving  one  unit  clause  U  =  {  L}  and  another 
clause  C  (which  could  also  be  a  unit  clause).  Suppose  C  has  a  literal  M  and 
there  exists  a  most  general  unifier  0  such  that  Ld  =  After  applying  unit 

resolution  on  U  and  C,  the  resolvent  is  (C-M)9.  Since  a  unit  clause  has  no  other 
literals  than  the  one  resolved  upon,  the  resolvent  from  unit  resolution  does  not 
introduce  new  literals.  This  is  compatible  with  the  philosophy  of  hyper-linking 
that  literals  from  different  clauses  should  not  be  combined. 

UR  ( f/nit  Resulting)  resolution  [10]  is  a  sequence  of  unit  resolutions  in  which 
the  resolvent  (called  UR  resolvent)  is  a  unit  clause.  This  can  be  seen  as  a  multi- 
step  unit  resolution  and  is  particularly  helpful  because  the  UR  resolvents  can 
then  be  used  in  further  UR  resolution. 

We  use  UR  resolution  to  detect  a  contradiction  caused  by  the  model  found 
by  model  finding.  The  brisic  idea  is  to  check  if  derived  UR  resolvents  contradict 
any  literal  in  the  model.  We  keep  generating  new  UR  resolvents,  which  are  unit 
clauses  and  using  them  in  further  UR  resolutions.  If  a  derived  UR  resolvent  {  R  } 
contradicts  a  model  literal  L,  that  is,  there  exists  a  ground  substitution  9  such 
that  R9  =  ~L,  we  add  {  /?}  in  the  clause  set  (so  no  other  model  containing  L 
can  be  generated)  and  look  for  another  model. 

A  UR  resolvent  {  R  }  is  relevant  if  R  can  unify  with  a  model  literal  L.  Derived 
UR  resolvents  are  kept  temporarily  to  derive  more  UR  resolvents.  However,  after 
UR  resolution,  we  only  keep  the  relevant  UR  resolvents;  this  heuristics  allows  us 
not  to  keep  too  many  derived  unit  clauses  and  to  save  those  resolvents  relevant 
to  the  development  of  the  proof. 

UR  resolution  and  the  natural  predicate  replacement  seem  quite  similar.  The 
major  difference  is  that  we  might  save  intermediate  UR  resolvents  since  they  are 
logical  consequences  of  input  clauses.  In  natural  predicate  replacement  we  do 
not  save  any  intermediate  results  because  literals  in  the  model  are  not  really 
from  unit  clauses. 

Like  natural  predicate  replacement,  UR  resolution  might  not  terminate.  So 
a  bound  on  the  number  of  derived  resolvents  needs  to  be  used. 


27 


3  Discussion 

The  search  spac  .  of  semantic  hyper-linking  grows  when  new  literals  are  generated 
(for  example,  see  Fig.  1  and  2).  Also,  in  each  round  of  semantic  hyper-linking, 
ground  clauses  are  generated  to  change  the  model  (thus  the  new  semantic  struc¬ 
ture)  until  the  ground  set  is  unsatisfiable  [1],  VVe  have  described  three  model 
finding  strategies  that  change  the  models  without  using  semantics  which  is  often 
expensive;  at  the  same  time,  they  do  not  increase  the  search  space  because  they 
only  generate  clauses  containing  model  literals  or  their  negations. 

Those  refinements  to  incremental  model  finding  still  retain  the  completeness 
of  semantic  hyper-linking.  They  also  maintain  the  soundness  since  they  all  gen¬ 
erate  ground  logical  consequences  of  the  input  clauses.  More  importantly,  they 
have  positive  impact  on  the  performance  of  semantic  hyper-linking.  In  the  proofs 
we  have  obtained  so  far,  the  model  finding  strategies  almost  always  help. 

To  illustrate  the  power  of  the  model  finding  strategies,  let  us  look  at  two 
examples.  IM  V  (intermediate  value  theorem  in  real  analysis)  and  AMS  (attaining 
maximum  theorem  in  real  analysis)  are  two  very  hard  theorems;  these  two  hard 
theorems  often  generate  very  large  search  spaces  that  many  powerful  general 
theorem  provers  (for  example,  OTTER  [8]  and  CLIN  [5])  cannot  handle.  With 
proper  semantics,  semantic  hyper-linking  can  prove  IMV  in  39  seconds  using  4 
rounds,  and  AMS  in  492  seconds  using  7  rounds  on  a  DEC5500.  However,  if  the 
model  finding  strategies  are  not  used,  IMV  will  run  over  10,000  seconds  and  still 
cannot  find  the  proof;  AMS  will  take  2,500  seconds  and  21  rounds. 


Filtering 

Replacement 

Round  1 

0 

0 

HI 

Round  2 

0 

1  (1) 

2  (0) 

Round  3 

1  (1) 

0 

HI 

Round  4 

3(2) 

2  (1) 

0 

Total 

4  (3) 

3  (2) 

2  (0) 

(a)  IMV  proof 


Filtering 

Replacement 

m\ 

Round  1 

0 

0 

ai 

Round  2 

0 

0 

Dl 

Round  3 

0 

0 

Round  4 

1 

0 

D 

Round  5 

3 

0 

B 

Round  6 

2 

1 

Bl 

Round  7 

11 

0 

Total 

17 

1 

21  1 

(b)  AMs  proof 


Fig.  4.  Ground  clauses  generated  by  model  finding 


The  proof  of  IMV  is  found  by  an  unsatisfiable  ground  clause  set  containing 
12  clauses.  Figure  4(a)  shows  the  numbers  of  clauses  generated  by  each  model 
finding  strategy  during  the  proof  of  IMV.  Numbers  in  parentheses  represent  the 
clauses  in  the  unsatisfiable  set,  namely  the  useful  clauses  for  the  proof.  Out  of 
12  clauses  in  the  unsatisfiable  set,  5  of  them  are  generated  by  the  model  finding 


strategies  which  generate  in  total  9  ground  clauses. 

The  proof  of  AMS  is  found  by  L'R  resolution  of  the  model  finding  in  round 

7.  Figure  4(b)  shows  the  numbers  of  ground  clauses  generated  by  the  model 
finding. 

References 

1.  Heng  Chu  and  David  A.  Plaisled.  Semantically  guided  first  order  theorem  proving 
using  hyper-linking,  1992.  Submitted. 

2.  M.  Davis  and  H.  Putnam.  A  computing  procedure  for  quantification  theory. 
J.  ACM,  7(3);201-215,  1960. 

3.  P.  C.  Gilmore.  A  proof  method  (or  quantification  theory:  its  justification  and 
realization.  IBM  J.  Res.  Dev.,  pages  28-35,  1960. 

4.  J.  Herbrand.  Researches  in  the  theory  of  demonstration.  In  J.  van  Heijenoort, 
editor.  From  Frege  to  Godel:  a  source  book  in  Mathematical  Logic,  1879- 1931 , 
pages  525-581.  Harvard  Univ.  Press,  1974. 

5.  Shie-Jue  Lee.  CLIN:  An  .Automated  Reasoning  System  Using  Clause  Linking.  PhD 
thesis.  University  of  North  Carolina  at  Chapel  Hill.  1990. 

6.  Shie-Jue  Lee  and  David.  A.  Plaisted.  Eliminating  duphcation  with  the  hyper¬ 
linking  strategy.  J.  .Automated  Reasoning,  9:2b-42,  1992. 

7.  Rainer  Manthey  and  Fran<;ois  Bry.  SATCHMO;  a  theorem  prover  implemented  in 
Prolog.  In  E.  Lusk  and  R.  Overbeek,  editors,  Proc.  of  CADE-9,  pages  415-434, 
Argonne,  IL,  1988. 

8.  William  W.  McCune.  OTTER  S.O  Users  Guide.  Argonne  National  Laboratory, 
Argo.nne,  Illinois,  March  1990. 

9.  L.  Wos,  D.  Carson,  and  G.  Robinson.  The  unit  preference  strategy  in  theorem 
proving.  In  Proceedings  of  the  AFIPS  Conference  26,  pages  615-621,  Washington, 
D.C.,  1964.  Spartan  Books. 

10.  Larry  Wos,  Ross  Overbeek,  Ewing  Lusk,  and  Jim  Boyle.  Automated  Reasoning: 
Introduction  and  .Applications.  Prentice-Hall,  Inc.,  Englewood  Cliffs,  NJ,  1984. 


I 

I 

! 


An  Expressive  Three- valued  Logic  witli  Two 

Negations 

Douglas  R.  Busch’ 

DSV,  Stockholm  University  and  IDA,  Linkoping  University 

February  17,  1993 

Abstract 

Abstract  This  paper  presents  a  flexible,  expressive  system  KJ"  of 
three-valued  logic  with  two  types  of  negation,  having  a  sequent  axiomati- 
zation  which  is  an  extension  of  the  kind  originally  presented  for  Kleene’s 
strong  three-valued  logic  by  Wang.  The  system  K,7  turns  out  to  be  closely 
related  to  Lukasiewicz’s  three-valued  logic.  Applications:  (1)  Erik  Sande- 
waJl  has  recently  formulated  a  non-monotonic  variant  of  three- valued  logic. 

The  non-monotonic  “entailment”  relation  of  his  system  can  be  expressed 
by  a  kind  of  “circumscription”  formula  in  KJ".  (2)  J.  Shepherdson  has 
suggested  that  a  hybrid  three-valued  intuitionistic  logic  could  be  useful 
in  connection  with  Kunen’s  modification  of  Fitting's  three-valued  version 
of  the  Clark  Completion  semantics  for  logic  programs  with  negation.  A 
suitable  “intuitionistic  fragment”  of  K7  is  obtained  b"  allowing  in  proofs 
only  sequents  with  at  most  one  formula  in  the  succedent. 

1  Axiomatizing  three- valued  logic 

1,1  Kleene’s  strong  three- valued  logic 

Kleene’s  strong  three- valued  logic  [6]  is  characterized  by  the  three-valued  truth- 
tables: 


A 

t 

f 

u 

t 

t 

f 

u 

f 

f 

f 

f 

u 

u 

f 

u 

t 

f 

f 

t 

u 

u 

t 

f 

u 

t 

t 

f 

u 

f 

t 

t 

t 

u 

t 

u 

u 

V 

t 

f 

u 

t 

t 

t 

t 

f 

t 

f 

u 

u 

t 

u 

u 

The  implication  symbol  is  subscripted  to  distinguish  it  from  some  other  com¬ 
peting  three-valued  notions  of  implication,  notably  Luk^lsiewicz  impl'  ation. 

•Supported  by  Swedish  agencies  STU,  STUF  and  TFR. 


30 


Kleene’s  logic  has  an  axiomatization,  but  it  is  a  sequent  axionialization.  This 
avoids  the  problem  that  there  are  no  “3-tautologies”  to  axiomatize,  by  axiom- 
atizing  instead  the  notion  of  entailmeni  or  semanttc  consequence,  in  a  suitable 
three-valued  sense,  between  formulas’ . 

To  be  precise,  we  define  K3  to  be  the  set  of  sequents  F  =>  A,  containing 
formulas  built  up  in  the  usual  way  using  A  and  V  satisfying  F  ^3  A  ,  where 
this  means  that  every  3-assignment  which  makes  every  formula  in  F  take  the 
value  t,  also  makes  at  least  one  formula  in  A  take  the  value  t^. 

Now  there  are  no  “3-tautologies”  ,  in  Kleene’s  strong  three-valued  logic,  not 
even  p  — ►k  P,  but  there  are  some  significant  3-entailments,  for  example: 

pNsp,  p.gNag,  pNp.9 

1.2  A  cut-free  axiomatization 

For  classical  logic  it  is  well-known  that  Beth’s  tableau  method  and  its  notational 
variants  in  terms  of  semantic  trees  and  the  like  are,  in  a  sense,  equivalent  to 
Gentzen’s  sequent  calculus  [14]. 

Van  Benthem  [16]  sketches  how  to  modify  the  Beth  tableau  method  so  that  it 
still  works  in  the  3- valued  or  partial  context.  From  the  modified  tableau  method 
one  can  read  off  an  axiomatization  of  K3.  The  version  presented  here  is  further 
modified  (slightly)  to  ensure  invertibility  with  respect  to  the  Kleene  semantics. 

We  now  interpret  the  right-hand  side  of  a  Beth  tableau  as  “not  true”  ,  i.e.  f 
or  u.  The  usual  reduction  rules  for  conjunction  and  disjunction  still  work,  and 
they  are  retained  unchanged.  Both  of  the  usual  reduction  steps  for  negation  are 
dropped.  These  correspond  to  the  sequent  rules: 

F,  o  =>  A  F  =:»  g,  A 

F  =>  -<Q,  A  F,  -iQ  =>■  A 

The  reduction  step  that  converts  a  negated  ->a  on  the  right  to  an  un-negated 
a  on  the  left  corresponds  to  the  first  of  these,  the  rule  =>  This  is  not  even 
sound  with  respect  to  the  three- valued  semantics:  consider  p  p  and  =>  p,  -'p. 
The  dual  Beth  reduction  step  converts  a  negated  -ig  on  the  left  to  un-negated 
g  on  the  right.  It  corresponds  to  the  sequent  rule  ->  =>,  which  is  sound,  but  not 
invertible  with  respect  to  the  Kleene  semantics:  consider  =>  p,  -ip  and  -'-<p  =?  p 

When  the  Beth  tableau  approach  is  applied  in  the  context  of  classical  two¬ 
valued  logic  these  two  rules  for  negation  ensure  that  all  negation  signs  are  even¬ 
tually  eliminated:  a  negated  formula  on  one  side  sheds  its  negation  sign  and 
moves  over  to  the  other  side  of  the  tableau.  Here  in  three- valued  or  partial  logic, 
on  our  approach,  this  never  happens.  Negated  formulas,  either  on  the  left  or  the 

'According  to  Feferman  [4],  the  first  presentation  of  such  a  sequent  axiomatization  of 
Kleene’s  logic  is  due  to  Wang  [17],  but  the  basic  idea  has  been  re-discovered  several  times. 

^This  notion  of  entailment, which  is  a  direct  simple-nunded  generalization  of  the  ordinary 
notion  of  semantic  consequence  in  two-valued  logic,  has  evidently  [16]  been  independently 
formulated  by  Hans  Kamp  under  the  name  “strong  consequence". 


31 


right  side  of  a  tableau,  are  simplified  by  having  the  negations  “pushed  inwards" 
until  they  apply  only  to  atoms.  This  is  brought  about  by  using  Double-Negation 
Elimination  and  De  Morgan’s  Laws  as  re-write  rules: 

-i-ia  i— ►  a  ->(a  A  /3)  >— *  ->0  V  -<0  -i(a  V  /3)  •— *  A  ~i0 

The  “atomic  negation”  approach:  It  is  possible  to  build  it  into  the  defini¬ 
tion  of  formula  for  three-valued  logic  that  negation  applies  only  to  atoms.  We 
can  define  a  reduced  formula  to  be  one  built  up  from  literals  in  the  usual  way 
using  A  and  V,  then  view  what  would  ordinarily  be  the  formulas  as  “syntac¬ 
tic  sugar”  standing  for  reduced  formulas.  Now  the  steps  driving  ->  inwards  are 
not  inferences,  they  are  formally  just  part  of  the  manipulation  of  the  syntax. 
They  can  all  be  performed  before  running  the  tableau  procedure,  or  they  can  be 
interspersed  among  tableau  reduction  steps. 

The  criterion  for  closure  of  a  (sub-)tableau  needs  to  be  modified.  Now  a 
(sub-)tableau  closes  if  it  contains  the  same  literal  (atom  or  negated  atom)  on 
each  side,  or  if  it  contains  a  complementary  pair  of  literals  p,  -ip  on  the  left. 

PROPOSITION:  F  A  G  K3  iff  a  tableau  starting  with  F  on  the  left 
and  A  on  the  right  closes,  equivalently  iff  F  =»  A  is  provable  from  the  following 
sequent  calculus  postulates. 

Axiomsfor  p  an  atom: 

p  -ip  -'p  p,  ip  => 

Introduction  Rules:  the  usual  rules  for  conjunction  and  disjunction: 

F,  0,  i/^  =»■  A  _ F  =>■  A,  </)  F  =»  A,  1/’ 

F,  ^AV'i^A  r=>A,<i>Aip 


F,  0  =>  A  F, i/j  ^  A 
F,  0  V  t/)  A 

Structural  Rules:  Weakening  and  Cut: 


F  =>■(/»,  V’,  A 
F  =>■  (/>  V  !/>,  A 


F=>  A 
F,$  =►  'J,A 


F,  a  =>  A 


F  a,  A 


Cut  Elimination  holds,  as  for  classical  logic.  Any  sequent  which  is  provable  using 
Cut  also  has  a  “direct”  proof  that  does  not  use  Cut. 

Special  rules  for  negation  Instead  of  incorporating  the  treatment  of  nega¬ 
tion  into  the  syntax,  we  can  get  the  same  effect  by  postulating  six  special  rules 
for  1: 

F,q  =»  A  F  =»  g,  A 

F,  110  =>■  A  F  =>  11Q,  A 
F,  ig  =»  A  F,  ->P  =>  A  F  =>  ig,  iy3,  A 

F,  i(g  A  /?)  =>  A  F  =»  i(g  A  /?),  A 

F,  ig,  1/?  =>  A  F  =>  ig,  A  F  ^  ij3,  A 

F,  i(g  V  /?)  =>  A  F  i(g  V  /?) 


32 


1.3  Adding  external  negation 

There  is  another  notion  of  negation  which  arises  naturally  in  the  three-valued 
context,  the  external  negation  ~  a,  defined  to  be  t  if  a  is  f  or  u,  and  f  if  q  is 
t.  It  is  easy  to  check  that  <i>,a  ^3  ip  holds  iff  <t>  ^3~  o,  ip  and  also  0,  ~  a  ^3 
holds  iff  <p  [=3  a,  Ip  .  Slightly  more  generally,  we  can  see  that  the  usual  sequent 
conditions  for  negation,  namely; 

r  a,  A  r,  Q  =»  A 

r,  ~  a  =>  A  r  a,  A 

apply  to  this  external  negation.  That  is,  these  rules  are  sound  when  =>  is 
interpreted  as  ^3  ,  and  they  are  also  invertible  with  respect  to  the  strong  Kleene 
semantics. 

PROPOSITION:  In  fact  these  two  introduction  rules  extend  the  previously 
mentioned  complete  axiomatization  of  K3  to  a  complete  axiomatization  of  KJ" . 

If  we  want  to  take  the  “reduced  formula”  approach,  so  that  ->  strictly  applies 
only  to  atoms,  we  have  to  stipulate  how  to  interpret  ~  a,  since  now  reduced 
formulas  are  built  from  (-'-)literals  using  A,  V  and  ~.  We  stipulate: 


Of  Or 


1.4  A  natural  implication 

Once  the  external  negation  is  available,  it  is  natural  to  define 

aD  P  :=~  Q  V  /?. 

^  Unlike  — 1  this  notion  of  implication  satisfies  the  “deduction  theorem”  rela¬ 
tionship; 

r,  a  ^3  holds  iff  T  ^3  a  D  /?. 

More  generally,  the  usual  sequent  rules  for  implication  hold  with  respect  to  this 
notion  of  implication; 

r,  /?  =>  A  r  =»  g,  A  r,Q=>/3,  A 

r,oD/?^A  r=>aDp,A 

are  both  sound  for  ^3  .  This  notion  of  implication  is  the  weakest  one  for  which 
this  holds. 


2  Lukasiewicz’s  three-valued  logic 

Kleene’s  three-valued  tables  differ  only  slightly  from  tables  earlier  presented  by 
Lukasiewicz,  as  Kleene  himself  points  out  [6].  The  tables  for  negation,  conjunc¬ 
tion  and  disj\inction  are  identical,  and  so  is  the  table  for  implication,  except 
for  just  one  case,  the  case  where  both  inputs  are  u.  Whereas  Kleene’s  strong 

^Thjs  notion  of  implication  has  also  been  employed  by  Schmitt  [11]. 


33 


table  for  implication  makes  u  — u  =  u  ,  for  Lukasiewicz  implication  we  have 
instead  u  — u  =  t.  Lukasiewicz’s  system  also  contains  unary  operators  L  and 
M,  intended  as  formalizing  some  kind  of  temporal  necessity  and  possibility.  The 
tables  are. 


In  contrast  to  Kleene’s  logic,  Lukasiewicz’s  system  does  contain  some  3- 
tautologies,  for  example  p  — ‘t  p.  The  price  for  this  is  that  the  Lukasiewicz 
implication  is  not  “regular”  in  Kleene’s  sense,  a  sort  of  non-monotonicity  . 
PROPOSITION:  The  Lukasiewicz  implication  is  definable  within  K3  : 

a  — ►!,  /?:=(«  D  i?)  A  (->/?  D  ->«)• 

Thus  — ►£,  can  be  viewed  as  built  up  in  two  steps  within  KJ  • 

1.  First,  a  D  /?  i.e.  ~  a  V  is  the  weakest  implication  — ►  given  by  a  three¬ 
valued  truth-table  satisfying: 

^3Q  -*  fi  iff 

2.  Now  to  get  an  implication  which  satisfies  contraposition, 

1=3  o  implies  \=3  ->/J  — ‘  -la, 

we  take  (a  D  /?)  A  (-</?  D  -la). 

This  shows  that  the  Lukasiewicz  system  L3  is  by  no  means  as  arbitrary  and 
un-motivated  as  has  been  commonly  alleged,  for  example  by  Urquhart  [15]  and 
Feferman  [4]. 

3  Sandewall’s  non-monotonic  three-valued 
logic 

A  new  very  interesting  non~monolonic  three-valued  logic  has  recently  been  for¬ 
mulated  by  Erik  Sandewall  [10],  Perhaps  its  most  interesting  feature  is  the 
explicit  default  operator  D.  Da  is  intended  to  represent  that  a  has  been  as¬ 
sumed  by  default,  but  D  is  not  3-truth-functional.  Sandewall  presents  a  kind 
of  intensional  semantics  for  this  operator,  which  has  subsequently  been  modi¬ 
fied  and  developed  by  Doherty  and  Lukaszewicz  [2].  As  well  as  the  D-operator, 
Sandewall  defines  “external  operators”  L,  M  and  N  determined  by  the  tables. 


34 


Sandewall’s  L  and  M  are  actually  the  same  as  Lukasiewicz’s,  and  they  are 
definable  from  ->  and  A; 


La  =  — I  ~  a  Ma  :=~  ->0  Na  ;=~  aA  ~  ”>0 

3.1  A  non-monotonic  entailment  relation 

Instead  of  using  the  relation  ^3  as  an  entailment  relation,  Sandewall  uses  a 
non-monotonic  modification  of  it,  after  the  style  of  Shoham  [13].  We  write: 

to  express  that  is  t  in  every  C-minimal  3-model  of  F.  Here  C  is  the  information 
ordering  between  3- valuations,  i.e.  if  3- valuations  or  partial  valuations  are  coded 
in  the  obvious  way  as  sets  of  literals  (atoms  or  negated  atoms),  the  information 
ordering  is  just  set  inclusion.  That  is,  we  don’t  consider  all  the  3-models  of  F, 
we  restrict  ourselves  to  those  that  are  “as  undefined  as  possible” .  For  example, 
if  p,  q  are  distinct  atoms, 

pK  n<j 

holds.  In  the  minimal  3-modeJ  for  p  we  have  p  =  t  and  q  =  u. 

Now  we  sketch  how  this  non-monotonic  relation  [=5  can  be  expressed  in 
KJ  by  a  kind  of  circumscription  formula.  We  want  to  find  for  each  formula 
a  another  formula  C(a)  ,  with  the  property  that  its  models  are  precisely  the 
minimal  models  of  a  .  Then  we  will  have  the  equivalence; 

a  (=3  iff  [=3  0- 

To  illustrate,  let  us  suppose  that  a  =  a(p,  g,r)  contains  just  the  three  vari¬ 
ables  p,q,r.  Suppose  that  a  particular  3-assignment  u;  is  a  model  for  a.  Then 
this  assignment  is  a  minimal  model  for  a  iff  it  is  “as  undefined  as  possible”: 
none  of  the  3-assignments  which  have  more  ceises  of  u  than  this  one  ,  is  a  model. 
Now  look  at  each  atom  separately.  Consider  the  value  w  assigns  to  p:  if  it  is  u, 
then  it  cannot  be  made  informationally  less.  But  if  it  is  defined,  i.e.  either  t 
or  f,  then  if  this  value  is  changed  to  u,  the  resulting  assignment  will  no  longer 
satisfy  a  —  1/ indeed  the  model  w  is  minimal. 

Now  we  can  test  whether  p  is  defined  by  the  formula  p  V  -<p,  and  we  can 
express  that  the  result  of  replacing  p  by  u  in  a  is  not  a  model  by  ~  a(u,  q,  r) 
That  is 

(p  V-.p)  a(u,q,r) 

Call  this  Mp(a),  and  likewise  define 

Mg(a}  ;=  (q  V  ->q)  D~  a(p,  u,  r) 


and 


Adp,(a)  ;=  [(p  V  --p)  A  (9  V  -<q)]  D~  a(u,  u,  r). 


35 


VVe  obtain  the  strengthening  of  C(a)  we  want,  to  a  C(a)  which  has  as  its 
models  just  the  minimal  models  of  o,  by  taking 

C(a)  :=  ft  AjVlp(a)A  Af,(Q)  AA1r(a)AAd^r(»)  AA<pr(o)  A  A<p,{a)  A  A<p,r(«) 

Finally'*,  to  reduce  (3  let  qhe  the  list  of  atoms  which  occur  in  /?,but  do 

not  occur  in  a.  Then  these  should  all  get  the  value  u  in  a  minimal  model,  so  we 
can  take  the  translation  to  be 


C(ft)ANg-|=3  0. 


4  Semantics  for  logic  programs  with  negation 

Shepherdson®  remarks  that  it  would  be  desirable  somehow  to  combine  three¬ 
valued  with  tniuitionistic  logic,  and  he  suggests  that  one  might  do  this  starting 
from  a  “complete  and  consistent  deductive  system  for  3- valued  logic,  as  Ebbing- 
haus  has  done  for  a  very  similar  kind  of  3- valued  logic”,  and  he  cites  Ebbinghaus’s 
paper  [3].® 

A  hybrid  logic  of  the  kind  he  envisages  could  then  be  used  to  “tighten” 
further  Kunen’s  [7]  modification  of  Fitting’s  [5]  three-valued  semantics  for  logic 
programs  with  negation  as  failure,  in  order  to  get  a  closer  fit  between  the  three¬ 
valued  declarative  semantics  of  the  (Clark  completion  of)  a  logic  program  with 
negation,  and  the  procedural  semantics  given  by  SLDNF  resolution. 

Kunen’s  semantics  matches  the  behaviour  of  SLDNF-resolution  rather  well, 
and  for  propositional  programs  it  is  complete.  But  there  are  cases  where  the 
semantics  is  too  strong,  supporting  answers  where  nothing  resembling  PROLOG 
would.  Kunen  [9]  gives  this  example; 

p  - 
q(c)  *- 

q{X)  -  -.r(A) 

r(c)  ^ 

In  the  Clark  completion  -ip  is  equivalent  to  VX(X  =  c  V  A  /  c)  .  This  is  always 
t  in  Kunen’s  semantics,  so  the  semantics  has  to  support  a  yes  answer.  What 
is  more,  as  Kunen  points  out,  the  program  itself  is  sufficiently  well-behaved 
{hierarchical  and  strict)  that  the  three- valued  semantics  for  it  reduces  to  the 
two-valued  semantics.  Yet  SLDNF-resolution  can’t  derive  indefinite  answers,  so 
here  it  won’t  agree  with  the  semantics. 

*This  treatment  presupposes  of  course  that  u  is  a  propositional  constant  in  the  language. 
But  if  it  is  not,  just  take  a  new  atom  v  not  mentioned  in  a  or  0,  etdd  to  a  the  condition  Nt;,and 
treat  v  as  though  it  were  u. 

^We  refer  to  Shepherdson’s  survey  article  [12]  for  any  terminology  not  explained  here,  and 
to  the  articles  by  Kunen  [7,  8,  9]  and  Fitting  [5]. 

^Shepherdson  must  have  been  unaware  of  the  axiomatizations  of  K3  that  exist  in  the  liter¬ 
ature  going  back  to  Wang  [17],  even  though  Wang’s  paper  is  cited  by  Ebbinghaus. 


Can  the  semantics  be  weakened?  Kunen  raises  the  possibility  of  letting  = 
be  interpreted  as  any  three- valued  relation  which  makes  the  equality  axioms  in 
CET  come  out  as  t.  Then  there  could  be  a  model  containing  an  element  a  such 
that  a  =  c  is  u,  and  then  r(a),  q{a)  and  p  would  be  u  too.  But  the  problem  with 
this  is  that  r(a)  A  “'r(a)  becomes  u  as  well,  so  failure  is  not  supported  for  the 
query  ?  —  r(X),-<r{X)  ,  and  SDLNF-resolution  would  not  be  sound  with  respect 
to  this  weakened  semantics. 

Kunen  also  addresses  Shepherdson’s  suggestion  to  somehow  bring  in  intu- 
itionistic  logic.  Shepherdson  had  shown  that  the  original  statement  of  sound¬ 
ness  of  SLDNF-resolution  with  respect  to  the  2- valued  consequences  of  Comp(P) 
could  be  strenthened  to  the  stricter  notion  of  intuitionistic  consequence,  i.e. 
derivability:  if  a  query  ?  —  Ai , . . . ,  Ar  succeeds  with  answer  0,  then  Comp{P)  h/ 
V[(Ai  A  ...  A  Ar)0]  ,  and  if  it  fails,  then  Comp{P)  h/  V[->(Ai  A  ...  A  A^)]  . 

Kunen  points  out  that  we  can’t  just  use  intuitionistic  proof  theory  in  a  se¬ 
mantics  for  SLDNF-resolution,  because  from  the  program 

p^p 

an  answer  no  to  the  query  ?— p,  -'p  should  not  be  supported,  whereas  a  semantics 
that  took  over  all  of  intuionistic  logic  would  have  to  support  this  answer,  as 
h/  -'(p  A  -ip).  The  problem  here  is  that  this  is  exactly  the  sort  of  example  that 
called  for  three-valued  logic  in  the  first  place.  The  PROLOG  interpreter  will 
loop,  and  this  is  represented  by  setting  p  to  u. 

On  the  other  hand  ,  to  go  back  to  Kunen ’s  first  example,  h/  does  seem  to 
be  exactly  what  is  called  for  here,  to  represent  the  fact  that  that  the  interpreter 
can’t  derive  the  indefinite  conclusion  VAf(X  =  c  V  X  ^  c).  Actually  many 
people  have  remarked  that  PROLOG  behaves  rather  intuitionistically.  If  the 
clauses  are  viewed  directly  ais  sequents  in  the  obvious  way,  even  though  the  logic 
is  supposed  to  be  classical,  everything  is  happening  in  the  intuitionistic  fragment, 
since  (definite)  clauses  are  a  particular  case  of  intuitionistic  sequents,  i.e.  with 
a  single  formula,  actually  an  atom,  in  the  succedent. 

4.1  intuitionistic  three- valued  logic 

The  following  system  seems  to  be  a  reasonable  way  of  carrying  out  Shepherdson’s 
proposal  to  amalgamate  intuitionistic  and  three-valued  logic  to  get  “  I-3/” .  We 
start  with  the  full  version  of  the  sequent  calculus  for  first-order  logic. ^  To  be 
definite,  let  us  take  it  to  be  the  version  in  Kleene’s  treatise  [6]. 

•  To  make  the  system  intuitionistic,  we  impose  the  restriction  that  all  se¬ 
quents  in  proofs  have  at  most  one  formula  in  the  succedent. 

•  To  make  the  system  three- valued,  we  ban  the  use  of  ->  :  — . 

r^-ia,A 

•  To  offset  the  loss  of  =>  -1  (partially),  we  retain  those  of  the  special  rules 
for  negation  previously  formulated  that  are  intuitionistically  correct  as 
well. (Details  omitted  here  because  of  space  limitations.) 

^The  three- valued  treatment  sketched  so  far  extends  straightforwardly  to  predicate  logic. 


37 


•  Optionally  we  can  postulate  that  double  negation  elimination  holds  for 
atoms  by  adding  an  axiom  -■-ip(i)  p(x)  for  each  atomic  predicate  p(x). 

This  specifies  the  inference  mechanism.  My  modified  version  of  Kunen’s 
semantics  uses  these  rules  for  the  “intuitionistic  fragment”  of  K3  .  The  semantics 
is  still  recursively  enumerable  and  Comp{P)  is  essentially  unchanged,  but  we 
emphasize  that: 

1.  The  constraint  that  =  is  to  be  two- valued  has  to  be  expressed  by  the 
sequent 

^  =  y  A  A  ^  Y),  and  not  by=^X  =  YvX^Y.  In  the  context  of 

intuitionistic  logic  the  disjunction  has  a  distinctive  meaning,  such  that  we 
can’t  affirm  a  disjunction  unless  we  are  in  a  position  to  affirm  one  or  other 
of  the  disjuncts,  and  this  notion  would  be  too  strong  for  what  we  want  to 
say. 

2.  We  interpret  the  disjunction  in  the  completed  definitions  in  Comp{P)  as 
the  intuitionistic  disjunction. 

This  tightening  of  Kunen’s  semantics  evidently  satisfies  soundness,  and  it 
matches  the  procedural  behaviour  of  SLDNF-resolution  better.  Going  back  to 
Kunen’s  example,  we  can  see  that 

1.  A  yes  answer  to  the  query  ? - 'p  is  now  not  supported,  since  we  know 

^A=:yvA^yis  not  derivable  using  I-3/  from  Comp(P),  not  even 
from  -'(A  =  y  A  A  ^  y).  We  know  this  because  A  =  y  V  A  ^  y 
is  underivable  even  with  full  intuitionistic  logic  h/  ,  and  intuitionistically 
also 

->(A  =  y  A  A  ^  y)  is  an  instance  of  a  logical  law. 

2.  Failure  is  supported  for  the  query  ?  — r(A),  “^(A)  ,  because  r(A)  is  equiv¬ 
alent  to  A  =  c  and  =>  ->(A  =  c  A  A  ^  c)  is  an  instance  of  our  postulated 

=> -^(A  =  y  A  A  y). 

3.  On  the  other  hand,  we  are  not  postulating  =>  “"(p  A  -ip)  for  all  other  atoms 
p  or  for  all  formulas,  only  for  the  equality  predicate  X  =  Y.  We  know 
that  =>  -<{p  A  ->p)  is  not  derivable,  because  it  isn’t  derivable  in  K3  .  Thus 
we  don’t  get  a  problem  with  the  query  ?  —  p,  -’P  addressed  to  the  program 
P*-P  ■ 

References 

[1]  K.  Clark.  Negation  as  failure.  In  H.  Gallaire  and  J.  Minker,  editors.  Logic 
and  Databases.  Plenum,  New  York,  1978. 

[2]  P.  Doherty.  NML3  -  A  Non-Monotonic  Formalism  with  Explicit  Defaults. 
PhD  thesis,  Linkoping  University,  Sweden,  1991. 


38 


[3]  H.-D.  Ebbinghaus.  Uber  eine  Pradikatenlogik  mit  partiell  definierten 
Pradikaten  und  Funktionen.  Arch.  math.  Logtk,  12:39-53,  1969. 

[4]  S.  Feferman.  Towards  useful  type-free  theories  I.  Journal  of  Symbolic  Logic, 
49:75-111,  1984. 

[5]  M.  C.  Fitting.  A  Kripke/Kleene  semantics  for  logic  programs.  J.  Logic 
Programming,  pages  295-312,  1985. 

[6]  S.  C.  Kleene.  Introduction  to  Metamathematics.  D.  Van  Nostrand  Company 
Inc.,  Princeton,  New  Jersey,  1952. 

[7]  K.  Kunen.  Negation  in  logic  programming.  J.  Logic  Programming,  pages 
289-308, 1987. 

[8]  K.  Kunen.  Some  remarks  on  the  competed  database.  In  R.  A.  Kowalski  and 
K.  A.  Bowen,  editors,  Logic  Programming,  Proceedings  of  the  Fifth  Inter¬ 
national  Conference  and  Symposium,  volume  2,  pages  978-992,  Cambridge, 
Mcissachus^'tts  and  London,  England,  1988.  MIT  Press. 

[9]  K.  Kunen.  Signed  data  dependencies  in  logic  programs.  Journal  of  Logic 
Programming,  7:231-245,  1989. 

[10]  E.  Sandewall.  The  semantics  of  non-monotonic  entailment  defined  using 
partial  interpretations.  In  M.  Reinfrank,  J.  de  Kleer,  M.  Ginsberg,  and 
E.  Sandewall,  editors,  Non-Monotonic  Reasoning  2nd  International  Work¬ 
shop,  Grassau,  FRG  June  1988,  Proceedings,  volume  346  of  Lecture  Notes 
in  Artificial  Intelligence,  Subsenes  of  Lecture  Notes  in  Computer  Science, 
Edited  by  J.  Siekmann  ,  No  348  .,  pages  27-41,  Berlin  Heidelberg  New 
York,  1989.  Springer  Verlag. 

[11]  P.  H.  Schmitt.  Computational  aspects  of  three  valued  logic.  In  Proc.  8th 
Conf.  Automated  Deduction,  volume  230  of  Lecture  Notes  in  Computer  Sci¬ 
ence,  pages  190-198.  Springer- Verlag,  1986. 

[12]  J.  C.  Shepherdson.  Negation  in  logic  programming.  In  J.  Minker,  editor. 
Foundations  of  Deductive  Databases  and  Logic  Programming,  pages  19-88. 
Morgan  Kaufmann,  Los  Altos,  California,  1988.  Previously  available  as 
Report  PM-01-87,  School  of  Mathematics,  University  of  Bristol. 

[13]  Y.  Shoham.  Reasoning  about  Change.  MIT  Press,  1988. 

[14]  R.  M.  Smullyan.  First-Order  Logic.  Springer- Verlag,  1968. 

[15]  A.  Urquhart.  Many-valued  logic.  In  D.  M.  Gabbay  and  F.  Guenther,  editors. 
Handbook  of  Philosophical  Logic,  volume  III,  pages  71-116.  D.  Reidel,  1986. 

[16]  J.  van'Benthem.  A  Manual  of  Intensional  Logic.  CSLI,  Standford  Univer¬ 
sity,  California  94305,  2nd  edition,  1988. 

[17]  H.  Wang.  The  calculus  of  partial  predicates  and  its  extension  to  set  theory 
I.  Zeitschr.  math.  Logik  Grundl.  Math.,  7:283-288,  1961. 


Compiling  Proof  Search  in  Semantic  Tableaux 

Joachim  Posegga 

Universitlt  Karlsiuhe 

Institut  fUr  Logik,  Koirplexicat  und  Deduktionssysteme 
Am  FasanengaitenS,  7500  Kiuisrube,  FRG 
poseggaSira .uka . de 

February  8, 1993 

Abstract 

An  approach  to  implementing  deduction  systems  based  on  semantic  tableaux 
is  described;  it  works  by  compiling  a  graphical  representation  of  a  fully  expanded 
tableaux  into  a  program  that  performs  the  search  for  a  proof  at  runtime.  This  results 
in  more  efficient  proof  search,  since  the  tableau  needs  not  to  be  expanded  any  more, 
but  Che  prcrof  consists  of  determining  whether  it  can  he  closed,  only.  It  is  shown  bow 
the  method  can  be  applied  for  compiling  to  the  target  language  Prolog,  although 
any  other  general  purpose  language  can  be  used. 

1  Introduction 

The  basic  idea  of  tableau-based  systems  (see  (Fitting,  1990)  for  a  good  introduction)  is 
to  try  to  prove  inconsistency  of  a  formula  by  failure  of  a  systematic  model-construction 
process:  it  is  tried  to  satisfy  a  formula  by  stepwise  refinement  of  potential  models,  and 
a  proof  is  found  if  all  those  models  can  be  ruled  out  by  detecting  contradictions  in  them. 
This  working  principle  became  more  and  more  popular  in  automated  deduction  during 
the  last  years,  after  resolution  has  governed  the  field  for  two  decades.  Essentially,  the 
former  proof  procedures  offer  a  closer  relation  to  sranantics  than  resolution  does.  This 
is  handy  if  a  deduction  system  is  supposed  to  be  integrated  into  an  application,  rather 
than  designed  as  a  stand-alone  prover.  The  interest  in  this  steadily  increases,  which 
explains  the  shift  of  interest. 

In  order  to  apply  deduction  techniques,  a  concrete  implementation  method  is  re¬ 
quired:  the  stand^d  way  of  implonenting  tableaux-based  provers  is  to  keq)  an  explicit 
datastructure  representing  a  tableau  (usually  as  a  set  of  its  branches)  and  modify  it 
during  runtime.  This  can  be  compared  with  an  interpreter  for  a  programming  language: 
the  tableau  is  a  program  containing  statments  to  be  executed  (i.e.:  formuls  to  be  ex¬ 
panded),  until  certain  conditions  are  met  and  the  program  tmiinates  (i.e.:  all  branches 
are  closed).  This  paper  shows  how  the  proof  search  can  be  speeded  up  by  compilation 
techniques.  The  basic  idea  is  to  compile  a  fully  expanded  t^leau  into  a  program  that 
carries  out  the  proof  search  at  runtime. 


40 


The  underlying  idea  is  derived  from  the  authm’s  work  on  connpiiation  techniques 
for  first-order  deduction  with  Shannon  graphs  (Posegga  &  Ludascher,  1992;  Posegga, 
forthcoming  1993)  and  works  as  follows:  First,  an  arbitrary  first-order  formula  is 
transfomKd  into  a  graphical  representation  of  a  fully  expanded  tableau  for  it.  Then,  the 
graph  is  compiled  into  a  program  which  shows  the  formulae's  inconsistency  when  it  is 
executed.  The  execution  reflects  the  proof  search  in  semantic  tableauix  and  tries  to  close 
every  branch  in  the  tableau.  We  will  show  how  the  i»iociple  works  for  Prolog  as  target 
lavguage,  although  any  other  general-purpose  language  can  be  used. 

The  advantage  of  our  approa^ch  is  t^  some  of  the  efifort  for  the  proof  search  (namely 
expanding  the  tableau)  can  be  moved  to  a  preprocessing  phase  that  derives  the  graph 
and  generates  the  program  for  it.  This  preprocessing  is  of  only  linear  complexity  in  time 
and  space  wr.t.  the  length  of  the  negation  ntHmal  form  of  the  input  formula.  This  is 
due  to  the  faa  that  a  graph  insteakd  of  a  tree  is  used  for  representing  the  fully  expanded 
tableau;  it  uses  structure  sharing  and  represents  multiple  occurrences  of  subtrees  in  a 
tableau  only  once. 

Note,  that  there  is  only  a  notational  diffeiente  between  a  tableau  rq>resented  as  a 
graph  and  as  a  tree.  Fromatheoreticalpointof  view,  trees  are  much  handier  than  graphs, 
since  we  can  use  a  linear  notation  and  regard  them  simply  as  logical  formuls.  We  will 
refer  to  trees  for  the  theoretical  treatment  of  our  method,  but  the  reader  should  keep  in 
mind  that  an  implementation  should  use  a  gr^bical  rqnesentation. 

The  paper  is  written  from  a  practical  point  of  view  and  assumes  to  be  familiar  with 
the  themetical  background  of  semantic  tableaux.  When  arguing  on  the  inplementation 
level,  clearness  and  readability  is  preferred  over  showing  how  to  achieve  efficient  code. 
The  p^per  starts  by  discussing  compilation  techniques  for  propositional  formuls  in 
Section  2.  The  framework  is  carried  forward  to  the  first-order  level  in  Section  3; 
Chapter  4  draws  conclusions  from  our  research. 


2  Propositional  Logic 

For  reasons  becoming  clear  soon,  we  will  use  a  slightly  unusual  notation  for  writing 
down  tableaux  and  consider  them  simply  as  logical  formulae  with  signed  atoms; 
and  serve  as  signs.  Let  be  the  language  of  propositional  calculus  defined  in  the 

usual  way,  and  the  atomic  formulae  of  L**. 

Dcfinitioii  2.1  (Set  of  Fully  Expanded  Propositional  Tableaux) 

The  set  Tab  of  fully  etqpanded  propositional  tableaux  is  defined  to  be  the  smallest  set 
such  that 

f .  1  €  Tub  (“1"  denotes  the  atomic  truth  value  "true”) 

2.  ifJ  €  Jab  and  A  €  then  (+A)f\T  and{-A)/\T  e  Tab. 

T  i/TuTi  6  Tab,  then  G  Tab. 

The  elementsofJab  will  be  denoted  by  letters  ofthe  calligraphic  alphabet  "A,  B, . . 


41 


The  intuition  behind  this  notation  is  that  the  formulae  on  a  Inanch  denote  a  conjunc¬ 
tion,  and  that  branching  means  disjunction.  As  a  simple  example,  consider  the  formula 
“pA(5V-'r)”;  assume  wc  start  a  tableau  with  this  formula  and  expand  it  completely  in 
the  standard  way. 

I  +P  I 


We  get: 


which  can  be  writtenas  +pA((+9Al)V(-rAl)),  an  element 


of  Tab. 

The  atom  “1”  is  superfluous,  but  handy,  as  we  will  see  soon.  It  can  be  regarded  as  a 
mark  for  the  end  of  a  branch. 

Next,  we  will  see  how  such  a  representation  can  be  derived  for  a  formula.  The 
basic  idea  is  to  recursively  compute  fully  expanded  tableau  for  compound  formulae,  and 
to  combine  these  according  to  the  logical  connectives.  The  following  notation  will  be 
needed  for  handling  conjunctions: 

Definition  22  (Replacement  of  1-nodes) 

Let  A,B  ^  Tab;  rfe  replacement  ofl-nodes  in  Aby  B  is  recursively  defined  as: 


ifA  =  l 
ifA  =  Ai^A2 
^])  ifA  =  AiVA2 


The  following  recursive  function  conv  maps  an  arbitrary  propositional  formula  to  a 
fully  expanded  propositional  tableaux  for  it: 


Definition  23 


’  (+f)Al  ifFeL/a 

(-A)Al  i/F  =^^A,AeLA, 

conv{F)  =  <  conv(F')  ifF  =  --F' 

[con^]  “  a-formula 

conv(A)Vconv(^)  if  F  is  a  0-formula 

If  we  perform  the  replacement  opoation  in  the  above  definition  by  replacing  edges  to 
nodes,  rather  than  replacing  nodes  themselves,  we  can  derive  a  directed,  acyclic  graph. 
One  easily  verifies  that,  in  this  case,  computing  the  mapping  is  of  linear  complexity  in 
time  and  space  wxt.  the  length  of  the  input  formula  in  negation  ncvmal  form. 

As  a  simple  example,  assume  we  want  to  doive  a  graph  representing  a  fully  expanded 
tableau  for  (a  «-►  b)AaA->b.  First,  we  compute  the  graph  for  (a  <-►  6),  which  is  an  0- 
formula  with  0i  =  ^aA->b  and  02  =  aAb.  After  computing  the  graph  for  aA-<b,  we 
conjunctively  combine  both  by  replacing  ail  edges  to  1-nodes  in  the  first  gr^  by  an 
edge  to  the  second  graph: 


42 


A 

6  +0 

1 

A 

—a  1  -fa 

1  1 

7-6 

1 

3  -0 

1 

1  +a 

i 

1 

-6 

1 

1 

2+6 

1 

A 

1 

1 

= 

1 

4-6 

1 

! 

2+6 

1 

1 

1 

6  +a 

6ii 

7  -b 


1 


The  numbers  attached  to  each  node  in  the  graph  will  be  needed  later  for  the  compi¬ 
lation.  We  will  assume  that  the  generated  nodes  are  consecutively  numbered,  although 
it  would  be  sufficient  that  they  uniquely  denote  each  node.  The  node  “6  is  just  an 
aid  for  drawing  the  graph  and  means  that  the  edge  leading  to  it  actually  leads  to  node 
number  6. 

Besides  the  fact  that  the  tableau  is  represented  as  a  graph  with  numbered  nodes,  there 
is  no  difference  to  a  standard  semantic  tableau.  The  above  formula  was  inconsistent,  so 
the  tableaux  represented  by  the  right  graph  is  closed,  i.e.:  each  branch  of  the  tableau 
is  contradictory.  In  terms  of  the  graph  this  means,  that  both  paths  from  the  root  to 
the  1-leaf  are  contradictory.  Once  we  have  derived  such  a  graph  for  a  prt^sitional 
formula,  the  only  thing  left  to  do  for  obtaining  a  proof  is  to  test  this  condition. 

This  is  exactly  the  idea  of  the  proposed  method  for  compiling  the  proof  search:  we 
compute  the  grsq)h  in  the  above  way  and  compile  it  into  a  program  tlm  is  procedurally 
equivalent  to  the  above  test.  Any  general-purpose  target  language  can  be  used,  but  it  is 
particularly  easy  to  explain  the  process  with  Prolog: 

As  propositional  logic  is  decidable,  we  can  determine  whether  or  not  the  graph  we 
consider  has  an  open  path  to  a  1-leaf,  and  therefore  the  tableau  has  an  open  branch.  If 
this  is  the  case,  we  have  derived  a  model  for  the  formula.  We  will  generate  a  program 
that  enumerates  all  models,  i.e.,  it  enumerates  all  open  paths  in  the  graph.  The  method 
to  achieve  this  is  quite  straightforward:  for  each  node  in  the  graph  a  Prolog  clause 
node/2  is  generated  that  succeeds  if  an  open  path  through  this  node  to  the  1-leaf 
exists.  We  will  use  the  numbers  of  the  graph  nodes  to  distinguish  the  clauses  for  the 
nodes.  This  number  is  the  first  parameter  of  a  node-clause^ ,  the  second  is  the  path  that 
has  been  constructed  to  reach  the  node. 

There  are  two  types  of  nodes  in  a  graph: 

1.  binary  “V”-nodes:  in  this  case  the  clause  for  the  node  succeeds  if  an  open  path 
can  be  constructed  through  one  of  the  successors. 

'  Thu  is  done  for  performance  reasons,  since  Prolog  systems  usually  perform  indexing  on  the  first  argument 
of  clauses. 


43 


2.  unary  “A”-nodes  labeled  with  a  literal:  here,  tbe  clause  succeeds  if  the  literal  can 
be  added  to  tbe  path  without  yielding  a  contradiction,  and  an  open  path  through 
the  successor  node  exists.  If  the  successor  node  is  1,  the  last  condition  is  always 
true. 


A  minor  technical  problem  to  be  solved  is  finding  an  efficient  representation  of  paths;  we 
easily  can  determine  the  number  of  different  atoms  in  a  formula  during  the  construction 
of  tbe  graph,  so  a  good  solution  is  to  use  a  Prolog-term  of  this  length.  Each  argument 
in  this  term  represents  the  truth  value  of  the  according  atom  (denoted  by  or 
The  following  clauses  “implement”  the  graph  for  the  fully  expanded  tableau  of  (a 
b)AaA~>b  above;  the  atom  a  appears  at  the  first  position  in  the  path,  and  6  at  the  second. 
It  should  be  easy  to  see  that  satisfy  succe^  with  a  path  (a  model)  if  there  is  an 
open  branch  in  the  tableau,  and  fails,  otherwise: 


node (5, Path) 
node (3 , Path) 
node (4, Path) 
node (6, Path) 
node (7, Path) 
node (1, Path) 
node (2, Path) 
satisfy (Path) 


(node(3,Path)  ;  node {1, Path) ) . 
arg(l, Path, ~) ,  node (4, Path) . 
arg(2, Path, -) ,  node(6, Path) . 
arg ( 1 , Path, + ) ,  node (7 , Path) . 
arg(2, Path, -) . 

arg(l,Path,+) ,  node(2, Path) . 
arg(2 , Path, +) ,  node(6, Path) . 
functor (Path, path, 2) ,node{5, Path) . 


In  the  propositional  case  it  is  surprisingly  easy  to  switch  from  Prolog  to  another 
target  language.  As  we  have  seen,  the  only  thing  to  do  is  to  generate  code  that  simulates 
descending  in  the  graph  and  collects  literals  that  form  paths.  This  can  be  achieved 
in  conventional  programming  languages  by  defining  a  function  for  each  node  that 
recursively  calls  functions  for  successor  nodes  after  assigning  a  corresponding  truth 
value  to  tbe  atom  of  the  node,  if  possible.  Practical  experiments  with  C  and  8086 
Assembler  have  shown,  that  Prolog  programs  of  the  ^ve  kind  run  roughly  as  fast  as 
C,  but  about  20-30  times  slower  than  Assembler. 


3  First-order  Logic 

The  basic  difference  in  tbe  proof  search  between  propositional  and  first-order  tableaux 
is  that  we  must  deal  with  7-formulaB  on  branches  in  the  latter  case.  Such  formuls  have 
the  peculiarity  that  they  may  be  used  more  than  once  during  the  proof  search.  It  is 
therefore  not  sufficient  to  determine  a  fully  expanded  tableau  for  a  fmmula  and  try  to 
show  that  it  is  closed.  We  will  handle  this  by  extending  the  definition  of  a  fully  expanded 
tidrleau,  such  that  not  only  atoms,  but  also  fiilly  expanded  tableaux  for  7-formulae  can 
appear  on  branches.  Such  a  node  will  be  called  a  7-node;  tbe  graph  inside  is  called  a 
7-graph. 

Let  £,  be  the  language  of  first-order  calculus  and  jC/u  atomic  formulae  of  £,. 

Definition  3.1  (Set  of  Fully  Expanded  First-order  Tableaux) 

The  set  Tab^  of  fully  expanded  first-order  tableaux  is  defined  to  be  the  smallest  set 
such  that 


44 


1.  Tab  C  Tabjg 

2.  ifTi,  Ti  €  Tab^  andxi, . .  .,Xnare free  variables  in  Tu 
then({'^{xi,..,x„)Ti)AT2)  €  Tab. 


The  first-order  counterpart  of  cortv  will  be  denoted  by  conv^  and  is  defined  as  follows; 


Definition  3  J 


conVg,(F) 


r  (+^’)A1 

ct?/iVgj(F/) 

COrtV^(ai) 

(cofiV|g(/?i))  V  (conVgj(/%)) 


if  F  £  Lai 

ifF  =  ^A,AeLA, 

if  F  =  — F' 

if  F  is  a  Q-formula 

if  F  is  a  d-formula 


(Vi  co/iVgj(F'))Al 


coflVg(F') 


f\{y)lx\ 

fniy)/ 


if  F  is  a  '^-formula  of  the 
form  Vi  F' 

if  F  is  a  6-formula  of  the 
form  3(x\, . .  .,xn)  F',  y  are 
Al  the  fiee  variables  in  F',  and 
/i .  •  •  • .  /n  are  new  function 
symbols 


It  is  assumed  that  x  stands  for  a  finite  list  of  variables  Xu  - . . ,  Xm-  Skolemization  is 
performed  in  the  "liberalized  6-rule"  style  described  in  (Hdhnle  &  Schmitt,  1991)  on 
“3  ’’-formulce;  substitutions  are  written  in  braces 


Figure  1  shows  the  graph  representing  a  fully  expanded  tableau  for  the  following 
problem,  taken  from  Pelletier’s  problem  set  (Pelletier,  1986): 

Problem  3 J  (Pelletier  30) 


Axm^eUOJ:  Vx  (/(x)Vy(x)— ►-'fi(x)) 

Axm^el30J2:  Vx  ((^(x)— -.i(i))-(/(i)A/»(x))) 

Axm^elSOJ:  -iVx  t(x) 

The  left  graph  represents  the  fully  expanded  tableau  for  the  whole  formula  the  two 
graphs  for  the  7-formulae  are  drawn  separately  to  the  right  of  it. 

Compiling  such  a  graph  for  a  fully  expanded  first-order  tableau  is  slightly  more 
complicated  than  in  the  propositional  case.  We  will  again  explain  the  principle  with 
Prolog  as  the  target  language.  First,  we  must  choose  whether  we  aim  at  proving  validity 
or  inconsistency.  In  principle  this  does  not  matter,  but  for  historical  reasons  the  latter  is 
usually  preferred.  We  will  stick  to  this,  simply  because  “reversing”  things  would  make 
them  less  familiar  and  therefore  less  comprehensible. 

For  each  node  a  clause  is  generated  that  succeeds  it  the  paths  crossing  this  node 
are  contradictory.  A  clause  for  a  binary  node  succeeds  if  both  clauses  for  its  successor 
nodes  succeed,  and  a  clause  for  a  unary  node  labeled  with  a  literal  L  succeeds  if  either 


45 


+V,u 

V,[yl] 

V2[5] 

+V2n 

1  -/(A)  3  -h(A) 

1  1 

5  +g(B) 

1 

7+/(B) 

1 

10  — i(sitO) 

1 

1  1 

2  -g{A)  1 

1 

1 

6+i(B) 

1 

1 

t+h{B) 

1 

1 

1 

1 

1 

1 

1 

1 

1 

Figure  1:  Tableau  for  Problem  3.3 


•  there  is  a  substitution  cr,  such  that  the  current  path  and  L  become  inconsistent 
under  this  substitution,  or 

•  the  clause  for  the  successor  node  succeeds. 

If  we  cross  a  7-node  for  a  fully  expanded  tableau  of  a  7-fotmula,  we  will  just  note 
this  in  the  path,  but  we  will  not  enter  the  7-graph,  yet.  If  we  arrive  with  a  consistent 
path  at  a  1-leaf,  one  of  those  7-tableaux  in  the  path  is  selected  and  entered^ . 

The  basic  technical  problems  that  must  be  solved  for  the  compilation  process  are; 

1.  Variable  bindings  need  to  be  represented  and  passed  to  each  clause,  since  Prolog 
clauses  are  -  by  definition  -  variable  disjoint. 

This  can  be  solved  in  the  following  way:  when  consoiicting  the  graph,  all  variables 
that  t^pear  are  counted.  If  we  have  n  variables,  their  binding  can  be  represented 
by  an  n-ary  Prolog  term,  each  argument  bolding  the  binding  for  the  according 
variable. 

2.  It  is  in  general  necessary  to  have  more  than  one  instance  of  an  atom  (since  7- 
formulce  may  be  used  multiple),  so  we  will  not  know  the  maximal  length  of  a  path 
in  advance. 

For  handling  this  we  will  represent  a  path  by  an  open  list  holding  signed  instances 
of  atoms^. 

3.  y-graphs  iruroduce  new  variable  bindings. 

If  variable  bindings  are  handled  as  pointed  out  in  1,  the  Prolog  clauses  for  a 
7-graph  rqn-esenting  a  fully  expanded  tableau  for  a  7-formula  can  be  “re-used” 
arbitrarily  often,  if  a  new  binding-term  is  created.  The  slot(s)  holding  bin- 
ding(s)  for  the  quantified  variablefs)  are  simply  dropped  and  left  void  by  inserting 

reflects  the  usiul  tieatineiit  of  7-fomiulK  in  ubieuix:  they  are  not  used  if  the  branch  can  be  closed 
immediately. 

^There  are  of  course  more  efficient  solutions  than  this,  but  we  will  prefer  readability  over  efficiency,  here. 


46 


an  anonymous  Prolog  variable.  So,  it  is  possible  to  re-use  the  tableaux  for 
7-formula:  without  asserting  new  clauses. 

Table  1  shows  the  complete  Prolog  program  for  Problem  3.3.  Recall  that  a  clause 
succeeds  if  all  paths  aossing  the  according  node  are  closed. 

Each  clause  node  /  3  “implements"  one  node.  The  first  parameter  is  the  number  of 
the  node  in  the  graph;  the  second  parameter  is  used  to  implement  a  depth-bound  on  the 
search  that  controls  the  number  of  applications  of  7-formulae.  Paths  are  represented 
by  an  open  list  holding  signed  atoms,  or,  in  the  clauses  “gcunmal”  and  “gaiiima2”  the 
name  of  the  clause  for  the  top  node  of  a  7-graph.  If  the  end  of  a  path  (“1”)  is  reached 
without  having  found  a  contradiction,  use.geumna/3  selects  one  of  them  an  calls  the 
entry  clause  unless  a  depth  bound  is  reached,  close/2  tries  to  find  a  substitution  such 
that  a  path  is  closed.  Note,  that  this  predicate  must  enumerate  all  substitutions  during 
backtracking. 

A  proof  is  done  by  calling  the  top  node  (gairanal)  with  the  empty  path,  proo  f  ( 2 ) 
succeeds,  whereas  proof  ( 1 )  fails. 

4  Conclusion  &  Outlook 

We  have  described  an  approach  to  tableaux-based  theorem  proving  that  works  by 
translating  an  arbitrary  first-order  formula  into  graph  representing  a  fully  expanded 
tableau  for  it.  This  graph  can  then  be  compiled  into  a  program  that  performs  the  proof 
search  when  executed.  We  showed  how  to  do  this  with  Prolog  as  a  target  language, 
although  any  other  general-purpose  language  can  be  used. 

The  principle  of  compiling  formulae  into  a  general-purpose  target  language  offers 
several  sKlvantages;  inter  alia, 

•  it  can  considerably  speed  up  the  proof  search, 

•  it  offers  a  flexible  way  for  embedding  deduction  into  applications,  since  the 
compilation  can  generate  stand-alone  subroutines  that  do  not  require  a  logical 
engine  to  run  on. 

The  presented  Prolog  code  can  also  be  further  optimized,  especially  the  first-order 
version.  The  propositional  version  seems  to  be  not  very  far  from  an  optimum,  apart 
from  the  fact  joining  clauses  that  have  only  one  caller  (“unfolding"  the  Prolog  code) 
might  result  in  more  efficient  Prolog  code. 

The  method  differs  from  other  approaches  to  deduction  by  Horn-clause  generation 
(e.g.  (Shekel,  1988)),  in  that 

1.  the  generated  program  has  no  direct  logical  relation  to  the  formula  that  is  to 
be  proven  (i.e.,  the  Prolog  clauses  are  not  a  logically  equivalent  variant  of  tbe 
formula),  but  that  they  are  procedurally  equivalent  to  the  search  for  a  model. 

2.  The  method  does  not  require  a  conjunctive  normal  form. 


use_gainma(0,_,_)  :  -  !  ,  fail . 
use_gainma(Limit,  Path.VarBnd) 

NewLimit  is  Limit  -  1, 
member (geunina(N)  .Path) , 
node (N, NewLimit , Path.VarBnd) . 

close (+L1. [Hj Path] ) 

(H  =  'L2.  unify (LI. L2 )) ;  close {+L1. Pa th) . 
close ( -LI . [H I  Path] )  : - 

(H  =  +L2.  unify(Ll.L2) ) ;  close ( -LI . Path) . 

node(gammal. Limit. Path.VarBnd) ;- 

node (gainma2, Limit,  [g^^nma(4)  |Path]  .VarBnd)  . 

node ( gamma2 . Limi t . Path . VarBnd ) : - 

node (10. Limit, [g^u^ma(9) | Path] .VarBnd) . 

node (1.  Limit. Path. bind ( A. B) ) 

(close(-f (A) .Path)  .node (2, Limit, [-f (A) |Path] .bind(A.B) ) ) . 
node(2, Limit, Path. bind(A,B) ) :- 

(close(-g(A) .Path)  ; us e_gamma (Limit. [-g(A) |Path] .bind(A.B) ) ) . 
node (3 .Limit . Path, bind (A. B) ) : - 

(close(-h(A)  , Path)  .-usejamma (Limit,  t-h(A)  | Path]  ,bind(A, B) ) )  . 
node (4,  Limit, Path, bind (_,B) i 
noded, Limit, Path, bind(A,B) ) , 
node (3, Limit, Path, bind (A, B) ) . 
node(5, Limit, Path, bind(A, B) ) :- 

(close(+g(B) .Path)  .node (6, Limit, [+g(B) [Path] ,bind(A,B) ) ) . 
node(6, Limit, Path, bind(A,B) ) :- 

(close(+i(B) .Path)  ;use_gamma (Limit, [+i(B) |Path] ,bind(A,B) ) ) . 
node(7,Limit,Path,bind(A,B) ) :- 

(close(+f (B) .Path)  .node (8,  Limit, (+f (B) [Path] ,bind(A,B) ) ) . 
node ( 8,  Limit, Path, bind ( A, B) ) :- 

(close(+h(B) .Path)  ;use_gamma (Limit, [+h(B) | Path] ,bind(A, B) ) ) . 
node(9,Limit,Path,bind(A,_) ) 
node(5, Limit, Path, bind (A, B) ) , 
node (7, Limit, Path, bind (A, B) ) . 
node (10, Limit, Path.VarBnd) :- 

(close ( -i  (sltO) ,  Path)  use_jgamroa( Limit,  {-i  (s)c0)  |  Path) , VarBnd) )  . 
prove(Limit)  node (gammal, Limit, [],_) . 

Table  1:  Prolog  Program  for  the  Tableau  of  Figure  1 


48 


3.  The  cost  for  translating  a  formula  to  the  proposed  representation  of  a  fully  expan¬ 
ded  tableau,  and  the  compilation  of  the  tableau  are  both  linear  (in  space  an  time) 
w.r.t.  the  length  of  the  input  fonnula. 

One  can  argue  that  the  proof  search  in  a  tableau-based  prover  usually  works  by 
stepwise  developing  it  until  each  branch  in  it  is  closed;  therefore,  it  might  happen  that 
branches  close  with  some  compound  formulae  they  are  holding.  Considering  only  fully 
expanded  tableaux,  as  this  approach  proposes,  results  in  a  tableau  proof  procedure  that 
never  closes  a  branch  in  this  way,  but  with  literals  only.  From  a  theoretical  point  of  view 
this  might  prevent  finding  a  short  proof;  in  practice,  however,  it  very  rarely  happens 
that  branches  can  indeed  be  closed  with  compound  formuls.  So,  it  seems  justified  to 
neglect  this  issue. 

An  experimental  implementation  of  a  variant  of  the  method  (see  (Posegga  & 
Ludfischer,  1992))  has  shown  the  the  speedup  gained  by  generating  a  Prolog  program 
instead  of  using  a  “uaditional”  approach  to  implementing  a  tableau  prover  (without 
compilation)  in  Prolog  is  around  a  factor  of  10.  Using  Assembler  as  the  target  language 
(which  has  been  implemented  f(X  a  {nopositional  version)  results  in  a  program  that  runs 
another  20-30  times  faster  than  the  version  compiling  to  Prolog. 


References 

Melvin  C.  Fitting.  First-Order  Logic  and  AutomatedTheorem  Proving.  Springer,  New 
York,  1990. 

Reiner  Hahnle  &  Peter  H.  Schmitt.  The  liberalized  6-rule  in  free  variable  semantic 
tableaux,  to  appear,  1991. 

Francis  Jeffry  Pelletier.  Seventy-five  problems  for  testing  automatic  theorem  provers. 
Journal  of  Automated  Reasoning,  2:191  -  216, 1986. 

Joachim  Posegga  &  Bertram  Ludfischer.  Towards  first-order  deduction  based  on  shan¬ 
non  graphs.  In  Proc.  German  Workshop  on  Artificial  Intelligence,  LNAI,  Bonn,  Ger¬ 
many,  1992.  Springer. 

Joachim  Posegga  First-order  Deduction  withShannonGraphs.  PhD  thesis.  University 
of  Karlsruhe,  Karlsruhe,  FRG,  forthcoming  1993. 

Mark  E.  Shekel.  A  Prolog  Technology  Theorem  Prover.  In  E.  Lusk  &  R.  Overbeek, 
editors,  9th  International  Cottference  on  Automated  Deduction,  Argonne,  III.,  May 
1988.  Springer- Verlag. 


1 


Short  CNF  in  Finitely- Valued  Logics 

Reiner  Hahnie* 

Institut  fur  Logik,  Komplexitat  und  Deduktionssysteme 
Universitat  Karlsruhe.  7500  Karlsruhe.  Germany 

haehnle@ira.uka  .de 


Abstract.  We  present  a  transformation  of  formulae  from  arbili:^  finitely-valued 
logics  into  a  conjunctive  normal  form  based  on  signed  atomic  formulae  which 
can  be  used  to  syntactically  characterize  many-valued  validity  with  a  simple 
resolution  rule  very  much  like  in  classical  logic.  The  transformation  is  always 
linear  with  relation  to  the  size  of  the  input,  and  we  define  a  generalized  concept 
of  polarity  in  order  to  remove  clauses  which  are  not  needed  in  the  proof.  The 
transformation  rules  are  based  on  the  concept  of  ’seis-as-signs’  developed  earlier 
by  the  author  in  the  context  of  tableau-based  deduction  in  many-valued  logics. 
We  claim  that  the  approach  presented  here  is  much  more  efficient  than  existing 
approaches  to  many-valued  resolution. 


Introduction 

With  this  paper  we  make  a  step  toward  the  efficient  mechanization  of  deduction  in 
many-valued  logics.  The  need  for  research  of  that  kind  is  motivated  by  the  recent 
advent  of  new  applications  for  many-valued  theorem  proving  in  various  subfields,  for 
example,  in  formal  hardware  verification  [5J.  Other  applications  exist  in  the  theory  of 
error-correcting  codes  or  in  non-monotonic  reasoning. 

It  is  widely  acknowledged  that  the  existence  of  clausal  normal  forms  for  a  logic 
can  greatly  improve  efficiency  and  speed  of  theorem  proving  procedures  for  that  logic. 
Resolution-based  theorem  provers  usually  rely  on  the  input  being  in  conjunctive  normal 
form  (CNF),  but  also  most  other  proof  procedures  that  claim  high  performance,  employ 
CNF  transformation  as  a  preprocessing  step.  If  these  successful  techniques  are  to  be 
used  in  non-classical  theorem  proving,  it  is  likely  that  some  variant  of  CW  is  required 
for  the  respective  non-classical  logics. 

There  are  three  main  obstacles  that  have  to  be  overcome  when  clausal  normal  forms 
are  to  be  used  in  a  generalized  context: 

1.  Normal  forms  can  become  exponentially  long  wrt  the  length  of  the  input  when 
a  naive  algorithm  is  used.  This  is  not  so  problematic  in  classical  logic  where 
knowledge  bases  usually  consist  of  conjunctions  of  relatively  short  formulae.  In 
non-classical  logics,  however,  even  relatively  short  formulae  can  become  quite 
large  during  this  process  already. 

2.  The  normalized  input  bears  no  resemblance  to  the  orginal  formula.  This  makes  it 
hard  to  explain  the  machine-generated  proof  to  the  user. 

3.  Many  non-classical  logics  may  fail  to  have  normal  forms,  or  at  least  it  is  non-trivial 
to  find  them. 

The  first  two  problems  can  principally  be  solved  by  using  a  structure  preserving 
clause  form  translation  (defined  in  the  following  section)  which  has  the  double  advan¬ 
tage  of  (i)  producing  normal  forms  in  linear  time  and  space  wrt  to  the  input  and  (ii) 


Research  supported  by  Deutsche  Forschungsgemeinschaft  (DFG). 


50 


establishing  a  relationship  between  the  clauses  of  the  normal  form  and  the  subformula 
of  the  input  formula. 

As  to  the  third  problem,  it  is  not  likely  that  there  is  a  uniform  solution  to  it  due  to  the 
diversity  of  non-classical  logics.  It  has  been  shown,  however,  that  for  certain  classes  of 
non-classical  logics  structure  preserving  CNF  transformations  (and  corresponding  re¬ 
solution  rules)  can  be  devised  to  give  the  desired  results  17].  These  include  intuitionistic 
logic  and  various  modal  logics. 

The  purpose  of  this  note  is  to  define  structure  preserving  clause  form  translations 
(together  with  a  suitable  definition  of  clau.ses  and  a  resolution  rule)  for  arbitrary  finitely- 
valued  logics,  a  domain  where,  to  our  best  knowledge,  no  general  results  exist  so  far. 
The  normal  form  computation  will  be  linear  wrt  to  the  length  of  the  input  and  quadratic 
wrt  to  the  number  of  truth  values  in  the  worst  case.  The  short  CNF  translation  for  many¬ 
valued  logics  proposed  in  the  following  is  centered  around  a  technique  that  has  been 
developed  earlier  by  the  author  [3, 4]  in  connection  with  non-clausal  theorem  proving 
with  semantic  tableaux.  The  main  advantage  is  a  relatively  simple  resolution  procedure 
for  finitely-valued  logics  which  avoids  the  drawbacks  of  a  non-clausal  approach  [11], 
while  retaining  the  main  advantages  of  resolution,  notably,  strategies  for  pruning  the 
search  space.  We  treat  the  propositional  case  thoroughly  and  give  some  hints  how  to 
handle  the  first  order  case.  Due  to  space  restrictions  we  omit  proofs.  These  may  be  found 
in  the  long  version  of  this  paper  which  is  avilable  from  the  author  on  request. 

1  Short  Normal  Forms  in  Classical  Logic 

In  the  following  we  denote  with  #(  A/ )  the  cardinality  of  a  set  M  and  with  [s]  we  denote 
the  length  of  a  string  s.  We  use  '"x"'  to  denote  the  ceiling  function  on  the  rationals.  We 
assume  the  reader  is  familiar  with  the  basic  notions  of  computational  logic.  Throughout 
the  paper  we  will  use  a  standard  syntax  for  propositional  and  first-order  logic,  here 
and  there  enriched  with  some  new  unary  and  binary  operator  symbols.  Clauses  are 
considered  as  finite  multisets  of  literals. 

The  central  idea  behind  structure  preserving  clause  form  translations  is  to  introduce 
additional  atoms  which  serve  as  abbreviations  for  subformulae  of  the  input.  Assume  we 
have  a  propositional  formula  0  and  we  need  a  finite  set  of  clauses  such  that  1=  0  iff 

h  □. 

Let  5^(0)  denote  the  set  of  subformulas  of  <t>  (note  that  #(5F(0))  =  |0|)  and  let 
m  =  #(5F(0)).  We  denote  with  Z.  the  complement  of  a  literal.  Now  we  introduce  a  new 
variable  p,  for  each  0,  e  5F(0)  which  is  not  a  literal  and  consider  for  each  1  <  j  <  ?n 
and  0i  =  (0j  Op  4>k)  the  formula 


Pi~(pjOpp*)  (1) 

where  op  is  the  top-most  connective  of  <f>i  and  pj ,  pk  either  correspond  to  0j ,  0t 
or  0/  =  Pi  if  0;  is  a  literal.  This  process  is  called  abbreviation,  definition  or  renaming 
by  various  authors.  Let  be  a  CNF  representation  of  (1).  The  number  of  clauses  in 
X^,,  is  bound  by  a  constant  depending  on  the  type  of  connectives  present  and  is  at  most 
4;  each  clause  contains  at  most  3  literals.  Now  we  can  define 


— 


(2) 


where  p\  is  the  definition  of  0.  It  is  fairly  easy  to  see  that  A'^  has  indeed  the  desired 
properties,  in  particular,  X^  contains  at  most  12m  -i- 1  literals. 


51 


In  the  first-order  case  the  Pi  are  atomic  formulic  with  an  appropriate  arity. 

Example!.  Consider  the  propositional  tautology  p  D  (q  D  p).  We  introduce  the  fol¬ 
lowing  renamings:  p\  — *  p  D  P2)  P2  ^  ^  P-  So  the  formula  is  a  tautology  iff  the 

following  set  of  clauses  is  unsatisfiable: 

1.  ~  pi 

2.  ~  Pi  V  p2  V  ~  p 

3.  Pi  V  p 

4.  Pi  V  ~  P2 

A  refutation  of  this  clause  set  is  as  follows: 

8.  [1,3]  p  10.  [7,8]  P2 

9.  [1,4]  ~P2  11.  [9,10]  □ 

Note  that  2  ,5.  (each  corresponding  to  one  half  of  an  equivalence)  and  6.  were  not 
needed.  Obviously,  our  CNF  contains  redundant  clauses. 

There  is  a  rather  obvious  improvement  of  the  procedure,  if  one  observes  that  instead 
of  logical  equivalence  in  ( 1 ),  depending  on  the  polarity  (cf.  [10])  of  Pi  in  the  original  for¬ 
mula  (ie  in  ~  4>),  only  one  direction  of  the  implication  is  needed  in  order  to  characterize 
satisfiability.  Therefore,  instead  of  { 1 )  we  write 

Pi  D  (Pj  op  pk )  if  p,  occurs  positively  in  ~  <A 

(Pj  op  Pi )  D  Pi  if  Pi  occurs  negatively  in  ~  (^  (3) 

Pi  —  (pj  op  Pi )  if  Pi  occurs  positively  and  negatively  in  ~ 

If  we  apply  this  optimization  to  the  previous  example,  clauses  2 .  and  5 .  are  not  generated. 

Further  improvements  are  possible  if  not  all  subformulas  are  being  renamed,  for 
instance,  conjunctions  need  not  to  be  renamed.  An  optimal  result  (the  clause  set  {p,  ~ 
p,  ?})  would  have  been  obtained  using  a  top-down  renaming  algorithm  [2]  which  takes 
this  into  account.  In  [2]  one  may  also  find  additional  references  regarding  structure 
preserving  clause  form  for  classical  logic. 


5.  ~p2VpV~(/ 

6.  P2  V  q 

7.  P2  V  ~  p 


2  Many- Valued  Logic 

Definition  1  Syntax,  Truth  Values.  Let  L  be  a  propositional  language  with  propositio¬ 
nal  variables  L,  and  connectives  F.  Let  A  =  {o,  , . . . ,  }  be  ‘he  set  of  truth 

values  and  let  n  =  #(A). 

Definition  2  Semantics,  Many- Valued  Lt^ic.  Connectives  F  e  F  are  interpreted  as 
functions  with  finite  range  and  domain,  in  other  words,  if  k  is  the  arity  of  F  we  associate 
a  function  J  ■.  N’’  ^  N  with  F  which  we  call  the  interpretation  of  F.  Let  f  be  the 
family  of  functions  over  N  associated  with  connectives  in  F.  Then  we  call  f  n-valued 
matrix  for  L  and  the  triple  (L,  f ,  N)  n-valued  propositional  logic. 

Definitions  Valuation.  Let  £  =  (L,f,  A)  be  a  n-valued  propositional  logic.  A  va¬ 
luation  for  £  is  a  function  r  :  Lo—  A.  As  usual,  v  can  be  uniquely  extended  to  a 
homomorphism  from  L  to  A  via 

v(F{4)\ , . . .  ,01-))  =  -  v{(t>k)) 

where  /  is  the  interpretation  of  F. 


52 


Definition  4  5-Satisfiable,  S-Tautology.  For  S  C  N  and  a  n-valued  propositional 
logic  C  call  a  formula  e  1.  5-satisfiable  iff  there  is  a  valuation  such  that  i(c>)  e  V. 
Call  4>  a  5-tautology  iff  i>{4>)  e  5  for  all  valuations. 

For  some  examples  we  refer  the  reader  to  the  following  section.  Our  task  is  now  to 
find 

1 .  a  language  of  clauses  C: 

2.  a  structure  preserving  linear  translation  tr  from  L  x  2^  into  2^  ; 

3.  a  resolution  rule  on  C.i.e.  a  decidable  relation  R  C  0*^+*  for  some  it. 

such  that  tr(0, 5)  F  Q  iff  </>  is  a  5-tautology  (where  F  is  the  reflexive  and  transitive 
closure  of  R). 

3  A  Structure  Preserving  Normal  Form  Translation  for 
Many- Valued  Logics 

In  [3. 4]  the  author  ititroduced  semantic  tableau  systems  that  can  be  used  to  implement  a 
generic  theorem  prover  which  performs  efficiently  in  a  variety  of  finitely-valued  logics. 
The  key  idea  was  to  enhance  the  formula  language  in  such  a  way  that  the  still  to  be 
considered  valuations  at  each  step  of  the  proof  can  efficiently  be  kept  track  of.  The 
technical  device  was  the  use  of  truth  value  sets  as  signs  or  prefixes  in  front  of  the 
formulae.  We  define  the  set  of  signed  formal®  L‘  =  {5  :  5|5  C  TV,  G  L}  with  the 
intended  meaning 

S :  (pis  satisfiable  iff  t>(4>)  e  5  for  some  v. 

Example!.  A  sound  and  complete  rule  with  premise  { j)  :  0  v  t/>.  where  v  is  three- 
valued  strong  Kleene  disjunction  (which  is  defined  i(0  v  0)  =  max(  i)(0),  is 


{5}:0Vt/’ 


{O,i}:0 

1 

i{O,i}:0 

One  way  to  visualize  rules  with  a  premise  5  ;  (0  op  0)  uses  coverings  of  those 
entries  in  the  truth  table  of  op  that  are  members  of  the  set  5.  ^h  of  the  rule  extensions 
corresponds  to  a  partial  covering  of  these  entries.  The  union  of  all  coverings  corresponds 
to  the  collection  of  extensions  that  make  up  the  conclusion  of  a  rule.  In  Example  2  above, 
the  left  extension  covers  the  area  indicated  in  the  following  diagram  on  the  left,  while 
the  right  extension  covers  the  area  shown  in  the  diagram  on  right. 


V 

5] 

1 

0 

Oil 

T 

1 

111 

1 

1 

1 1 

1 

V 

g 

1 

1 

0 

o' 

T 

5 

T 

1 

I 

tH 

1 

2 

1 

1 

1 

1 

1 

Now.  tableau  rules  and  tableaux  correspond  to  DNF  formul® .  while  we  are  interested 
in  CNF  formul®.  What  we  need,  therefore,  are  inverse  tableau  rules  where  the  extensions 
are  conjunctively  connected  and  the  extensions  themselves  are  clauses  over  signed 
atoms.  Consequently,  we  define  the  language  of  clauses  C  to  be  the  clau.ses  over  LJ . 
The  V  in  C  will  be  interpreted  classically,  i.e.  two-valued  (like  the  implicit  disjunctions 
of  tableau  branches  in  many-valued  tableaux  which  are  also  interpreted  classically): 


53 


Definitions  Satisfiability  on  C.  An  atomic  signed  formula  S  :  i>  is  satisfied  by  r  iff 
i’(p)  E  S.  Let  D  ^  C  and  D  =  Si  ;  pi  v  V  .“s*  :  pk.  D  is  satisfied  by  i  iff  i  satisfies  at 
least  one  5,  ;  Pi  in  D.  A  clause  set  .V  C  C  is  satisfied  by  i  iff  i  satisfies  simultaneously 
each  member  of  ,V.  We  write  i<  E  A'  for  this  fact.  X  is  satisfiable  iff  i  E  \  for  some  r. 


How  can  we  compute  inverse  tableau  rules?  We  can  still  use  the  technique  with 
coverings,  however,  we  must  turn  things  around.  Each  extension  (or  clause)  corresponds 
to  a  covering  that  contains  at  least  the  entries  occurring  in  .S'.  The  intersection  of  all 
coverings  must  contain  exactly  the  entries  occurring  in  5. 


Examples.  The  inverse  tableau  rule  with  the  premise  of  Example  2  is: 


TIF7 


The  extensions  correspond  to  the  following  coverings: 


V 

T 

I 

0 

R 

Ti 

Jh 

T 

J. 

If 

i 

l| 

1 

i 

i 

1 

5 

\_ 

of 

" 

1 

T 

Tt 

i 

I 

ill 

3 

I 

The  conclusion  of  the  rule  corresponds  to  the  following  set  of  C'-clauses: 


{{0,  i} :  <p.  {0,^} :  ;  <*'V  {1} :  c) 


We  draw  inverse  tableau  rules  with  double  vertical  bars  to  distinguish  them  from  the 
ordinary  rules. 

The  next  step  is  to  express  logical  equivalence  within  this  framework.  Consider  the 
following  definition  of  (strong)  many-valued  equivalence: 


v(4> 


r  1  ii4>)  =  v{4’) 

\  0  otherwise 


Let  us  give  a  formulation  with  C-clauses  of  — .  It  will  be  convenient  to  use  the 
following  abbreviations  for  signs' 


Definition  6  Signs.  Let  j  be  arbitrary. 


>j 

:=[j.linN 

>J 

<j 

:=[0.jiniV 

<i 

O'.Un  A 

[0,j)nA- 


Now  consider  the  (2n  -  2)  C-clauses  of  the  following  form: 


<j 

>j 

•  1 

where  j  <  1 

>j 

:pV 

<j 

:  9 

Let  us  denote  this  clause  set  with  A'„  ;  it  is  easy  to  prove  that  p  is  { 1  }-satisfiable 

iff  A'„  is  satisfiable  and  p  —  9  is  {0}-satisfiable  iff  A'„  is  unsatisfiable. 


54 


We  have  now  all  prerequisites  for  mimicking  the  structure  preserving  clause  form 
translation  described  in  (1)  and  (2)  in  the  many-valued  case.  We  can  apply  the  very 
same  procedure  for  computing  a  CNF  over  C-clauses  for  some  formula  <?  as  in  the 
classical  case.  For  each  formula  of  the  form  ( 1 )  we  simply  expand  the  signed  formula 

{1}  :  (I'l  ~  (/>j  op />t))  (5) 

using  the  inverse  tableau  rules  for  and  op  (recall  that  the  conclusion  corresponds 
to  a  set  of  C-clauses,  namely  to  A'_. ).  This  process  yields  a  set  A  ;, ,  for  each  non-atom  ic 
subformula  o,  of  <?.  To  establish  5-satisfiability  of  o  for  some  .V  C  .V  it  is  sufficient  to 
show  that 


is  unsatisfiable  (where  }>]  is  the  renaming  of  <j>).  Since  the  branching  factor  of  each 
inverse  tableau  rule  is  at  most  n  we  have; 


Corollary?.  In  every  n-valued  logic  C  for  every  S  :  <p  &  L’  there  is  a  C-CNF 
representation  of  <t>  of  length  0[n-\<t)\)  such  that  S  :  0  is  valid  iff  X^^  O . 

Let  us  illustrate  the  method  with  an  example. 

E rrimpjc  4  Consider  three-valued  strong  Kleene  logic  with  an  extra  negation  (cal¬ 
led  SKL).  For  convenience  we  repeat  the  .semantic  definitions  (which  are  valid  for  any 
number  of  truth  values):  =  1  -  (■(<;>),  t  (~  o)  ='^  1  -  i'(o)\  i’(c>  a  i  )  = 

min(i’(<p),  i;( (/’)).  *'(<5  V  t'  )  =  max(t’(<;i).  (’(1/  )),  v(0  D_t')  =  max(l  -  1(0).  r(c  )). 

To  establish  that  -•p  J  (~  /)A-'/>)isatautology(inSKLboth  s  and  1  are  designated 
truth  values,  i.e.  support  validity)  we  need  to  show  that  it  is  a  { 1  }-tautology  which 
is  the  case  iff  the  C-CNF  of  {0}  ;  D  (~  p/\-^p)  is  unsatisfiable.  To  simplify  things 
a  bit  we  introduce  no  new  variables  for  negated  atoms.  Thus  we  have 

{0}  :  7 

{1}  :  (7  —  (-'P  D  ?•))  (7) 

{1}  :  (r  —  (~  pA  -•p)) 

We  begin  to  expand  the  second  formula  (cf.  (4)); 

{1}  :  (7  —  (-.p  D  r)) 

{j- ?  {0}  :  7  {0, )} ; 7  {i}  :  7 

{0}  ;  -ip  D  r  { J ,  1 }  ;  -.p  3  r  { 1 }  :  -ip  D  /•  {0, ) }  :  -^p  D  f 

The  formulae  containing  an  implication  have  to  be  expanded  further  in  order  to  yield 
the  clause  set  A,,.  Similarly,  we  compute  the  set  AV.  Together  with  the  first  clause  in 
(7)  we  arrive  at  the  following  set  of  signed  clauses  which  characterizes  the  original 
problem: 


1. 

2. 

3. 

4. 

5. 

6. 


{0};7 

{L1}:7V  {0):r 

{i,l}:7  V  {0}:p 

{0}  ;  7  V  {(,  1} :  p  V  {L 

{0, 1 }  ;  7  V  { 1 }  :  p  V  { 1 }  :  r 

{1};7  V  {0,(}:p 


7.  {1}:7  V  {O.U:'- 

8.  {l}:p  V  {i,l}:r 

9.  {0.  ( }  :  p  V  {0)  ;  r 

10.  {0}  :  p  V  {O.ij  :  r 

11.  {(.l};pV  {l}:r 


4  Signed  Resolution 


We  still  have  to  provide  a  resolution  rule  for  d  -clauses.  The  following  rule  is  one  of 
several  possibilities  and  very  close  to  standard  binary  resolution; 


S\  :  pV  D\  S,n  :  i>  V  Om 
Di  V  V  D„, 


if  .s'l  n  •  •  -  n  >■„,  =  0 


(H) 


For  completeness  we  need  a  factoring  rule  due  to  similar  reasons  as  in  the  classical 
case. 


■V|  :  V  V  .S~„.  :pv  D 
(5i  U  U  S,„  ):pV  D 


(9) 


Soundness  of  rules  (8)  and  (9)  is  straightforward  to  show.  Completeness,  too.  is  not 
hard  to  prove  with  a  semantic  tree  argument  that  is  readily  generalized  to  more  than  two 
truth  values  by  allowing  /i-ary  semantic  trees.  In  [  1 1  a  similar  result  is  proved  with  that 
method  and  can  readily  be  adapted  to  the  present  case. 


Examples.  We  continue  Example  4  and  show  that  the  set  of  C ’-clauses  generated  there 
is  unsatishable: 

12.  [1,2  {O}:/-  14.  [8,12]  {l};p 

13.  [1,3]  {0}:p  15.  [13,14]  □ 

Note  that  only  the  input  clauses  1.,  2.,  3.  and  8.  have  actually  been  used  in  the 
derivation.  We  will  come  back  to  (his  issue  in  the  next  section. 


5  Improvements 

We  present  a  simplification  which  parallels  (3).  We  have  already  seen  in  Example  5 
that  most  of  the  clauses  were  not  redundant.  We  have  to  define  a  generalized  notion  of 
polarity  in  the  presence  of  more  than  two  truth  values. 

Definitions  Many- Valued  Polarity.  Let  .S  :  6  L*  and  let  T  be  a  fully  expanded 

inverted  tableau  for  5  :  4>.  For  each  subformula  v  of  o,  if  the  occurrences*  of  c  in  T 

are  5i  ;  0, . . . ,  Sm  '■  0.  we  say  that  0  occurs  with  polarity  /?  =  (S'l . .9,,, }  in  o.  We 

abbreviate  this  fact  with  R:  v’  <  S  :  <p. 

Note  that  by  definition  of  inverted  tableau  rules  0  2  5',  2  holds  for  each  S,  in 
R.  For  each  polarity  R  we  define  a  binary  connective  =>  r  by 

( 0  if  lit)  ^  Si,  v(<i>)  e  Si,  S,  occurs  in  R 
1^(0  =^R  V)  =  I  ,  otherwise 

We  observe  that  in  two-valued  logic  =>{(i})  is  the  same  connective  as  Dand  =S"(^o}) 
is  the  same  connective  as  c.  Moreover.  =>({o),ti})  is  the  same  connective  as  —.  Now 
we  replace  (5)  by 

{1}  :  (Pi  =>R  iPj  oppt)).  if  R  :  {pj  op  pk)  <  S  :  0.  (10) 

In  two-valued  logic  we  can  get  rid  of  the  signs  simply  by  writing  everywhere  p  for 
{ 1 }  :  p  and  ~  p  for  {0}  ;  p.  Together  with  the  observation  above  (10)  collapses  into 
(3)  for  two-valued  logic,  if  we  associate  positive  polarity  with  ({ 1 }).  negative  polarity 
with  {{0})  and  both  polarities  with  ({0},  {!}). 


^  Be  aware  tliat  identical  strings  can  be  different  subforinulas,  such  as  p  in  p  O  p.  On  the  other 
hand,  the  same  subfomiula  can  occur  multiply  in  the  tableau,  since  they  are  copied  in  some 
rules,  such  as  <t>  in  Example  2. 


56 


Example  6.  Let  us  apply  Detiniiion  8  to  (7)  from  Example  4.  Obviously,  D  {  — 
P  A  -•;))  (Kcurs  with  polarity  ({0})  in  {0}  :  -•p  D  p  A  -•p).  To  see  the  polarity  of 

~  A  we  begin  to  compute  the  inverse  tableau  lor  {0}  :  -■/O  A  -•/>). 


{0}:  -/0(~/>A-/.) 
{1}  :  -'/>ll{0)  p  A -^p 


We  sec  that  the  polarity  of  A  -^p  is  ({0}).  too.  Therefore,  we  substitute  bt>ih 
occurrences  of  —  in  (7)  by 


(0}  :  ./ 

{1}  :  =^(10))  ^  '■)) 

{1}  :  (r  ^{{0))  (~  pA  -.p)) 


As  an  easy  exercise  the  reader  should  verify  that  the  clause  set  corresponding  to 
{*}  ■■  (</  =>({0})  (-'P  D  >•))  is  {{L  1}  ;  </  V  {0}  ;  I\  {^,  1}  ;  7  V  {0}  ;  p}  and  the 
clause  set  corresponding  to  (;•  =><{o))  (~  P  A  ->p))  is  ({ 1}  :  v  {1}  :  p}.  But  these 
are  exactly  clauses  2.,  3  and  8.  from  the  old  clause  set  (cf.  Example  5)  and  they  were 
exactly  the  ones  used  in  the  refutation.  Hence  we  succeeded  in  eliminating  all  clauses 
that  were  redundant  in  Example  5. 

Theorem  9,  Let  ,V^  he  a  set  of  C-elauses  correspi  ndin^  to  some  (p  €  L  computed 
according  to  (5)  and  let  he  a  set  of  C-clauses  computed  according  to  (10).  Then 
X'  h  □  iffXl  h  □. 

Unfortunately,  in  the  worst  case  the  number  of  generated  clauses  can  still  be  quadratic 
wrt  the  number  of  truth  values  (use  the  same  example  as  before).  If.  however,  the  maximal 
number  of  occurrences  of  either  4>  or  4'  with  different  signs  in  the  conclusion  of  each 
rule  with  premise  5  :  <5  op  c  for  all  signs  S  and  connectives  op  in  a  logic  is  k  <  u  we 
can  replace  C7(rP|ol)by  <  +  /  for  some  I  in  the  corollary  above.  In  classical  logic 

we  have  k  =  1  if  no  equivalences  are  present  and  k  -  u  =  2  otherwise.  See  [4]  for  a 
class  of  many  -valued  logics  where  k  =  \. 

We  suspect  that  much  of  the  work  in  (2]  can  be  generalized  to  the  many-valued 
case,  in  particlar,  it  should  be  possible  to  prove  the  optimality  of  certain  many-v;dued 
CNF  translations  under  suitable  restrictions  on  the  connectives. 

Other  possible  improvements  regard  the  resolution  rule.  We  state  some  well-known 
strategies  from  classical  resolution  in  our  many-valued  setting.  This  strengthens  our 
claim  that  signed  resolution  is  a  natural  extension  of  two-valued  resolution. 

Definition  10  Subsumption.  Let  D,  E  be  two  C'-clauses.  We  say  that  D  is  subsumed 
by  E  iff  for  each  literal  5|  ;  p  in  E  there  is  a  literal  S2  :  p  in  D  such  that  Si  C  5?. 

Having  this  definition  at  hand,  we  can  formulate  subsumption  strategies  as  in  the 
classical  case. 

Our  final  point  in  this  section  is  that  among  other  strategies  the  set-of-support 
strategy,  as  well  as  a  pure  rule  and  deletion  of  tautologies  (clauses  containing  a  literal 
of  the  form  A'  :  p)  may  be  formulated  and  proved  as  complete  based  on  our  notion  of 
satisfiability  just  as  in  the  two-valued  case. 


57 


6  Related  Work 


We  have  seen  that  it  is  possible  to  compute  short  {(  -ICNF  tor  finitely-valued  logics  in  a 
quite  efficient  way  using  truth  value  sets  as  signs  and  inverse  tableau  rules.  The  resulting 
set  of  signed  clauses  is  “flattened  out "  and  provides  no  proof-theoretic  insight.  In  17] 
(for  modal  and  intuitionistic  logic)  and  [8]  (for  Lukasiewicz  logic)  a  different  view  is 
taken:  there,  a  logic  is  characterized  by  clause  sets  whose  syntax  does  not  involve  signs, 
instead,  certain  additional  logical  connectives  like  necessity  □  [7]  or  truncated  sum  v 
[8]  are  used.  The  problem  with  this  approtK:h  is  that  it  cannot  be  done  schematically — 
each  new  logic  requires  new  ideas.  Also  a  new  completeness  proof  of  the  associated 
resolution  rule  has  to  be  carried  out  each  time. 


In  1 1]  a  similar  translation  method  and  resolution  rule  as  developed  in  Sections  3  and 
4  can  be  found.  It  is  stated  in  somewhat  different  terms  and,  what  is  more  important,  uses 
only  single  truth  values  as  signs.  Also  the  translation  is  not  linear  and  does  not  employ 
polarity.  Thus  we  conjecture  that  our  own  approach  is  more  amenable  to  implementation. 
In  [9]  another  variant  of  signed  resolution  is  investigated. 

In  [II]  a  kind  of  many-valued  polarity  is  introduced  for  the  purpose  of  pruning 
the  enormous  search  space  in  non-clausal  resolution.  O'Heam  &  Stachn'-’^  ’s  polarity 
notion  is  defined  on  unsigned  formula;  and  clo.se  to  the  original  definition  of  Murray 
( 10],  We  do  not  see  any  resemblance  to  the  polarity  notion  developed  in  this  paper. 

In  [6]  the  notion  of  p-resolution  is  defined  in  the  context  of  automated  reasoning  in 
paraconsistent  logics.  TTiese  are  truth  value  lattice-based  logics  and  resolving  between 
literals  involves  not  only  mere  set  intersection  and  union  as  in  rules  (8,9).  but  meet  and 
join  operations  on  the  truth  value  lattice.  On  the  other  hand,  in  the  case  of  linear  orders 


of  truth  values  and  restriction  to  signs 


>i 

<i 

p-resolution  es.scntially  coincides  with 


rules  (8,9).  This  fact  suggests  that  our  method  can  be  expanded  to  the  treatment  of 
non-clausal  paraconsistent  logics  ([6]  assume  to  have  the  input  in  signed  CNF). 


7  First-Order  Logic 


We  consider  many-valued  versions^  of  V.  3  which  have  the  simplified  Skolcm  conditions 
given  in  Table  1.  Other  signs  than  H 


<1 


are  handled  by  splitting'. 


S :  <p 
Si  :  <i> 


Sk:4> 

where  =  5i  U  •  •  U  Sk- 


Adding  these  expansion  rules  extends  the  translation  method  given  in  Section  3  to 
first-order  formuljE. 

Soundness  and  completeness  proofs  can  easily  be  obtained  by  combining  the  usual 
results  on  Skolemization  with  the  results  of  [4], 


^  Defined  as  V0({'iy)(l>)  =  min{«>;U.  (0)|u  €  U],  vg({'By)<(>)  =  max{r;5j  (i/>)|u  €  C}.  where 
min,  max  are  interpreted  naturally  on  N  and  U  is  the  underlying  universe. 


58 


Table  1.  Siiiiplitied  Skoleni  rules  for  quuiitified  fortnuhc. 


D 

]:(Vr)e>(z) 

: (Vr)o(r) 

[ 

[i 

il 

iplr)  1 

(3r)c>(r) 

;  o(/(r, - r,,)) 

>i  :(3j)o(.’) 

[ 

il 

:  0(r)  >i 

:  <?(/(ri - -  r,. )) 

(/  IS  a  new  function  symbol  and  /-i ,  . . ,  j-,,  are  the  variables  which  occur  freely 
in  the  premise  of  the  rule  ) 

8  Conclusion  and  Future  Work 

We  presented  the  first  steps  towards  an  efficient  resolution  system  for  many-valued 
logics  by  specifying  a  way  to  produce  sets  of  clauses  of  feasible  size  which  characterize 
the  original  problem.  The  use  of  short  normal  form  algorithms  is  not  very  widespread  in 
classical  logic,  since  they  are  not  really  necessary  there  for  most  problems.  For  many¬ 
valued  logics,  however,  their  use  becomc<^  essential,  in  particular  when  applications  like 
hardware  ve.ification  is  aimed  at.  where  large  formula;  are  to  be  expected. 

In  our  CNF  algorithm  we  dei  led  a  generalized  notion  of  polarity  which  allows  to 
remove  redundant  clauses.  This  is.  however,  only  the  first  part.  The  next  step  would  be 
to  translate  the  top-down  renamings  of  [2]  in  a  many-valued  setting.  For  the  resolution 
procedure  we  sketched  some  familiar  improvements  like  subsumption  in  the  m:my- 
valued  version.  Further  work  could  incli'de  the  investigation  of  successful  strategies 
such  as  hyper-resolution  and  ordered  resolution;  see  [  1 1  for  first  steps  in  that  direction. 

References 

1.  M.  Baaz  and  C.  G.  FennUller.  Resolution  for  many-valued  logics.  In  Proc.  LPAR  '^l. 
pp.  107  -  I  lb.  Springer.  LNAI  624.  1992. 

2.  T.  Boy  de  la  Tour.  Minimizing  tlic  number  of  clauses  by  renaming.  In  Proc.  10'*'  CADE. 
Kaiserslautern,  pp.  5.‘i8  -  572.  Springer.  Heidelberg.  July  1990. 

3.  R.  Hiihnle.  Towards  an  efficient  tableau  pi.,jf  procedure  for  multiple-valued  logics.  In 
Proc.  Workshop  Computer  Science  Logic.  Heidelberg,  pp.  248  -  260.  Springer.  LNCS  533. 
1990. 

4.  R.  Hahnle.  Uniform  notation  of  tableaux  rules  for  multiple-valued  logics.  In  Proc.  20'*' 
ISM\'L,  Victoria,  pp.  238  -  245.  IEEE  Press.  1991. 

5.  R.  Hahnle  and  W.  Kemig.  Verification  of  switch-level  circuits  with  multiple-valued  logics. 
To  appear,  1993. 

6.  J,  J.  Lu,  L.  J.  Henschen,  V.S.  .Subrahmanian.  and  N.  C.  A.  da  Costa.  Reasoning  in  paracon- 
sistent  logics.  In  Automated  Reasoning:  Essays  in  Honor  of  Woody  Bledsoe,  pp.  181-210. 
Kluwer,  1991. 

7.  G.  Mints.  Gentzen-type  systems  and  resolution  rules,  pari  1:  Propositional  logic.  In 
Proc.  COLOGS8,  Tallin,  pp.  198  -  23 1 .  Springer.  LNCS  417,  1990. 

8.  D,  Mundici.  Normal  forms  in  mfinile-valued  logic:  The  case  of  one  variable.  In 
Proc.  Workshop  Computer  Science  Logic  91 ,  Berne.  Springer,  LNCS,  1991. 

9.  N.  Murray  and  E.  Rosenthal.  Resolution  .and  path-dissolution  in  multiple-valued  logics.  In 
Proc.  ISMIS.  Charlotte.  1991. 

10.  N.  V.  Murtiiy.  Completely  non-clausal theorem  proving.  AT  i8:67  -  85.  1982. 

11.  P.  O’Heani  and  Z.  Stachniak.  A  resolution  framework  for  finilely-valued  lirsl-order  logics. 
Journal  of  Symbolic  Computing,  13:235-254,  1992. 


Defining  Variants  of  Default  Logic 
a  Modal  Approach 


Laura  Giordano 

Dipartimento  di  Infonnatica  -  UniversilA  di  Torino 
C.so  Svizzera  185  -  10149  TORINO 
E-mail;  taura@di.unito.it 


Abstract. Recently  some  variants  of  Reiter's  default  logic  have  been 
proposed.  These  variants  have  been  defined  by  altering  the  definition  of 
default  extension  and,  sometimes,  also  the  definition  of  default  theory. 
Recently  a  uniform  semantic  framework  has  been  introduced  by  Besnard  and 
Schaub,  in  which  the  semantics  of  the  various  default  logics  is  given  in 
terms  of  Kripke  structures. 

In  this  paper  a  uniform  syntactic  characterization  for  these  different  default 
logics  is  pre.sented.  First,  a  modal  default  logic,  called  K-default  logic,  is 
introduced.  This  logic  is  defined  similarly  to  Reiter’s  default  logic  but  it  is 
based  on  an  underlying  modal  logic  (instead  of  classical  logic).  We  show 
how  the  different  variants  of  default  logic,  like  Schaub's  Constrained 
Default  Logic,  Brewka's  CDL  and  also  Lukaszcwicz’  variant,  can  be 
reconstructed  within  KDL;  for  each  variant  a  different  modal  translation  of 
default  rules  is  proposed.  In  this  way,  the  differences  among  the  variants  are 
made  explicit  on  a  syntactic  ground. 


1.  Introduction 

Recently  some  variants  of  Reiter’s  default  logic  (DL)  [ReiterSO]  have  been 
proposed  [Lucaszewicz88,  Brewka91,  Delgrande&Jackson91,  Schaub91b],  to  cope 
with  some  unintuitive  results  classical  default  logic  may  lead  to.  These  variants  have 
been  defined  by  altering  the  defmition  of  default  extension  and,  sometimes,  also  the 
definition  of  default  theory,  as  in  Brewka's  Cumulative  Default  Logic  (CDL).  For 
these  variants  a  uniform  semantic  framework  has  been  inuoduced  by  Besnard  and 
Schaub  [Besnard&Schaub92]  in  terms  of  Kripke  sUuctures. 

In  this  paper  the  problem  of  defining  a  uniform  syntactic  characterization  for  these 
different  default  logics  is  addressed.  To  this  purpose  a  modal  default  logic,  called  K- 
default  logic  (KDL),  is  inboduced.  KDL  is  defined  like  Reiter's  default  logic  with  the 
difference  that  the  underlying  monotonic  logic  is  not  classical  logic  but  the  modal 
logic  K.  The  idea  is  that  in  such  a  modal  default  logic  the  presence  of  modal  operators 
allows  to  capture  the  differences  among  the  variants  on  a  syntactic  level,  in  the  way 
default  rules  are  stated.  While  Reiter's  default  logic  does  not  commit  to  assumptions 
and  in  the  definition  of  DL  extension  no  track  is  kept  of  justifications  of  applied 
defaults,  in  the  variants  mentioned  above  these  and  other  additional  information  are 
recorded  in  constructing  extensions.  Accordingly,  the  applicability  condition  for 


60 


default  rules  is  strengthened.  In  a  modal  default  logic  we  can  explicitly  keep  track  of 
these  additional  information  by  making  use  of  modal  operators. 

While  in  Section  2  KDL  is  defined  and  some  of  its  properties  are  stated,  in  Section 
3  and  4  it  is  shown  how  Schaub's  Constrained  default  logic,  Brewka's  CDL  and 
Lukaszewicz'  variant,  can  be  reconstructed  within  KDL;  for  each  variant  a  translation 
from  default  theories  to  K-default  theories  is  given.  In  particular,  the  different  variants 
require  a  different  modal  translation  of  default  rules.  In  this  way,  the  differences  among 
the  variants  are  made  explicit  in  syntactic  terms. 

As  observed  by  Besnard  and  Schaub  the  notion  of  commitment  to  assumptions 
[Poole89]  can  be  given  a  very  natural  interpretation  in  a  modal  setting.  Indeed,  the 
notion  of  commitment  has  also  a  rather  simple  syntactic  interpretation  in  KDL.  In 
Section  5  we  show  how  this  suggests  an  alternative  variant  of  default  logic,  which  is 
quite  similar  to  Brewka's  CLD,  is  cumulative  but  is  not  semimonotonic,  and 
embodies  this  notion  of  conunitment  to  assumptions. 

2.  KDL:  a  modal  default  logic 

In  this  section  we  define  K-default  logic  (KDL),  a  default  logic  whose  underlying 
monotonic  logic  is  the  modal  logic  K.  In  K-default  logic  modal  operators  are  allowed 
both  in  the  set  W  of  formulas  and  in  the  prerequisite,  justification  and  consequent  of 
default  rules. 

For  simplicity,  in  the  following  we  will  restrict  our  concern  to  the  propositional 
case.  Let  L  be  a  propositional  (non  modal)  language  and  L*  the  corresponding  modal 
language  in  which,  as  usual,  □  and  O  are  the  universal  and  existential  modal 
operators.  We  recall  Kripke  semantics  for  propositional  K  modal  logic 
[Hughes&Cresswell68,  Bowen79].  We  will  refer  to  the  language  containing  the 
logical  connectives  a  and  d. 

Let  Al  be  the  set  of  all  propositional  symbols  in  L*.  A  K-interpretation  for 
L*  is  a  triple  M=<W,R,e,w>,  where  W  is  a  set  of  worlds,  R  is  a  binary  relation  on 
W  (the  accessibility  relation),  e  is  a  valuation  function  e  :  W  ->  P(Al)  and  w  is  a 

distinguished  world  of  W.  We  define  the  satisfiability  of  a  closed  formula  a  of  L* 
at  a  world  we  W  in  an  K-interpretation  M  (M,w  l=j^  a )  as  follows: 

-M,w)=Kpiff  pee(w)  (ifpisinAL) 

-  M,whK«  and  M.wNkP 

-  M,w)=Kai3  P  iff  M,wkK“  Of  M.w^kP 

-  M,whK^®tff  there  is  a  world  w'e  W  such  that  wRw'  and  M,w'NKa 

-  M,w  o  iff  for  all  worlds  w’e  W  such  that  w  R  w',  M,w’  1=k  “  • 

A  formula  a  is  true  in  a  K-interpretation  M=  <W,R,e,w>  (written  M^k  ot)  iff 
M,w  1“  K  «•  We  say  that  a  is  a  K-valid  formula  (  N  k  “  )  iff  it  is  Ifue  in  every  K- 
interpretation  for  L*,  i.e.,  for  every  K-interpretation  M=<W,R,e,w>,  M,wk=K  A 


61 


formula  a  is  K-consistent  if  there  is  a  K-interpretation  M=  <W,R,e,w>  such  that 
M,w  h  K  ct-  In  tlie  following  we  will  denote  by  Th^fA),  the  set  of  logical 
consequences  of  A  in  the  logic  K,  that  is: 

ThicfA)  =  {  a  :  Hk  A  3  a  ). 

Let  us  now  define  the  notions  of  K-default  theory  and  KDL  extension. 

Definition  1.  A  K-default  theory  is  a  pair  (D,W)  where  W  is  a  set  of  formulas 
of  the  modal  language  L*  and  D  is  a  set  of  default  rules  of  the  form  A:B/C,  where  A, 
B  and  C  are  formulas  of  L*. 

Definition.  A  KDL  extension  of  a  K-default  theory  (D,W)  is  a  fixpoint  of  the 
operator  F  which,  given  the  set  of  L*  formulas  S,  produces  the  smallest  set  of  L* 
formulas  S'  such  that: 

(1)  WcS', 

(2)  S'  is  closed  wrt  logical  consequence  in  K,  i.e.,ThK(S')=S', 

(3)  if  (A:B/C)e  D,  AeS'  and  B«  S  ,  then  CeS'. 

Note  that  this  definition  of  KDL  extension  is  exactly  the  same  as  Reiter's 
definition  apart  from  the  fact  that  S'  is  required  to  be  closed  wrt  logical  consequence 
in  K  and  not  in  classical  logic  as  usual.  Hence,  most  of  the  properties  of  DL  hold 
also  for  KDL.  In  particular,  as  usual,  an  equivalent  quasi-inductive  definition  of  KDL 
extension  can  be  given. 

Definition.  Let  (D,W)  be  a  K-default  theory.  Define  Eo=W  and,  for  all  i>0, 
Ei+i=ThK(Ei)u  {C  l(A:B/C)eD,  AeEi  and  -.B^E). 

E  is  a  KDL  extension  of  (D,W)  iff  E=  Ui=o,to  Ej. 

Let  us  now  consider  some  examples  to  see  how  the  presence  of  modal  operators  in 
the  language  provides  more  flexibility  in  writing  default  rules. 

Example  1 : 

D=(:B/C,:-.B/D},  W={}. 

(D,W)  has  a  unique  (Reiter)  default  extension  E=  Th({C,DJ),  which  is  obtained  by 
applying  both  the  defaults,  though  their  justifications  are  mutually  inconsistent. 
Similarly,  E'=ThK({C,D))  is  the  unique  KDL  extension  of  (D,W).  If  we  want  to 
require  joint  consistency  of  justifications,  we  can  rewrite  the  default  theory  above  as 
follows: 

DK={:OB/(CAnCAnB),  :0 B /(D  a  d  Da  □ B) },  Wk={}. 

The  default  theory  (Dk.Wk)  has  two  KDL  extensions  Ei=ThK({C,  0  0,0  B}), 
obtained  by  applying  the  first  default,  and  E2=ThK({D,  □  D,  □  -i  B}),  obtained  by 
applying  the  second  one.  In  extension  Ei  the  second  default  is  not  applicable  since  the 
condition  -■  O  -i  B«e  Ei  does  not  hold  (in  fact  o  Be  Ei). 

In  the  theory  (Dk,Wk)  the  modal  formulas  □  B  and  □  B  introduced  in  the 
consequent  of  default  rules  are  used  to  commit  to  the  assumptions,  i.e.  to  record  the 
justifications  of  default  rules  when  they  are  applied.  Notice  that,  since  B  is  an 


62 


assumption  underlying  extension  E).  □  B  L  in  Ei;  Lovtcvcr,  D  is  not  true  in  Ei. 
Hence,  modal  operators  can  be  used  to  distinguish  among  the  beliefs  that  arc  true  in 
an  extension  (represented  by  non-modal  formulas  like  C  in  extension  Ei)  and  the 
assumptions  supporting  those  beliefs  (represented  by  modal  formulas  like  □  B  in 
extension  Ei). 

Of  course,  when  a  K-default  theory  does  not  contain  modal  operators  (neither  in  D 
nor  in  W)  its  KDL  extensions  are  in  a  tme  to  one  correspondence  with  DL  extensions. 
Hence,  as  in  Reiter's  default  logic,  also  in  KDL  existence  of  extensions  is  not 
guaranteed.  For  instance  the  default  theory  (D,W),  with  D=(  :AJ-,  A )  and  W=  { ) ,  has  no 
KDL  extensions.  Moreover,  the  following  propositions  hold. 

Proposition  1.  A  K-default  theory  (D,W)  has  a  K-inconsistent  KDL  extension  iff 
W  is  K-inconsistent. 

Proposition  2.  If  a  K-default  theory  (D,W)  has  a  K-inconsistent  KDL  extension 
then  this  is  its  only  KDL  extension. 

In  the  next  sections  it  will  be  shown  how  constrained  default  logic  [Schaub91b], 
CDL  [BrewkaQl],  and  Lukaszewicz'  version  of  default  logic  [Lucaszewicz88]  can  all 
be  mapped  to  K-default  logic,  while  preserving  their  extensions.  The  idea  underlying 
these  mappings  is  that  of  using  modal  operators  to  record  the  additional  information 
(justifications  and  consequents)  these  variants  keep  track  of 

3.  Mapping  constrained  default  logic  to  KDL 

Starting  from  the  observation  that  Reiter's  default  logic  lacks  desirable  features  like 
"cumulativity"  and  "commitment  to  assumptions",  some  variants  of  DL  have  been 
proposed.  Brewka  has  defined  a  Cumulative  Default  Logic  (CDL)  [Brewka91]  in 
which  assertions,  i.e.  formulas  labelled  with  a  support  are  introduced.  In  the  support 
of  a  formula,  the  justifications  and  consequents  of  defaults  used  to  derive  the  formula 
are  recorded.  In  this  way  cumulativity  is  obtained  and  also  the  problem  of  mutually 
inconsistent  justifications  is  solved. 

Other  cumulative  variants  of  default  logic.  Constrained  Default  Logic  [Schaub91b] 
and  J-Default  Logic  (r)elgrande&Jackson91],  turn  out  to  be  equivalent.  These  variants 
do  not  make  use  of  assertions  as  CDL,  but  (in  the  style  of  Lukaszewicz)  they  define 
extensions  as  pairs  of  sets  of  formulas  (E,C),  where  C  is  the  context  supporting  the 
beliefs  in  E.  In  spite  of  this  difference,  these  logics  are  very  similar  to  CDL  and  an 
equivalence  result  between  Constrained  Default  Logic  and  CDL  is  given  in 
[Schaub92]. 

In  this  section  we  will  define  an  interpretation  of  Constrained  Default  Logic  within 
KDL  by  giving  a  translation  of  Constrained  Default  Theories  to  K-default  theories.  A 
similar  interpretation  can  be  defined  for  Brewka's  CDL. 


63 


In  constrained  default  logic  tbe  language  and  the  notion  of  default  theory  are  the 
same  as  in  Reiter's  default  logic.  Let  us  recall  the  quasi-inductive  definition  of 
constrained  extension  in  lSchaub92]. 

Definition  (constrained  extension).  Let  (D,W)  be  a  default  theory.  Define 
Eo=W,  Co=W  and,  for  all  i>0, 

Ei+i=Th(Ei)u  (y  l(a:p/Y)eD,  aeEi  and  Cu{P,y}  is  consistent} 
Ci+i=  Th(Ei)  u  (P  A  Y  I  (a  :  P  /  Y )€  D,  ae  Ej  and  Cu { P.Y)  is  consistent) . 
(E,C)  is  a  constrained  extension  of  (D.W)  iff  (E,C)  =  ( Ui=o,(o  Ej ,  Ui=o,(o  Q  ). 

In  the  definition  above  Th(Ei)  denotes  the  deductive  closure  of  Ei  in  classical  logic 
and  also  the  consistency  of  Cu{p,Y}  is  checked  in  classical  logic.  Notice  that,  to 
enforce  commitment  to  assumptions,  the  justifications  and  consequents  of  applied 
defaults  are  recorded  in  the  context  C  of  the  extensiorv  Consider  again  Example  1. 
Example  1  (contd.); 

D={:B/C,:-,B/D},  W={}. 

(D,W)  has  two  constrained  extensions  (Th({C}),  Th(|C,B)))  and  (Th({D}),  Th({D, 
-iB})).  On  the  contrary,  we  have  seen  in  Section  2  that  (D,W)  has  a  single  Reiter 
extension  in  which  C  and  D  are  both  true. 

In  [Schaub91b]  a  notion  of  lemma  default  rule  is  introduced,  by  means  of  which 
cumulativity  in  constrained  default  logic  is  preserved.  Moreover,  constrained  default 
logic  is  semi-monotonic  and  existence  of  extensions  is  guaranteed. 

We  will  now  give  an  interpretation  of  constrained  default  logic  in  KDL  by  defining 
a  mapping  from  default  theories  to  K-default  theory  which  preserves  constrained 
extensions.  As  explained  above,  we  will  make  use  of  modal  operators  to  keep  trade  of 
the  context. 

Definition  (modal  interpretation  for  constrained  default  logic).  Let 
(D,W)  be  a  default  theory.  An  associated  K-default  theory  (Ds,Ws)  can  be  defined  as 
follows: 

Ws  =  WuDW  and 

Ds=  {  A:0(BAC)/(CAn  CaQ  B)  I  (A:B/C)  eD  }, 
where,  given  a  set  of  formulas  F,  □r=  {  oa  1  a  e  F  }. 

Note  that  □  B  occurs  in  the  conclusion  of  the  modal  default  rule  and  this  allows  to 
commit  to  the  assumption  B .  This  is  in  perfect  agreement  with  the  interpretation 
given  in  [Besnard&Schaub92]  to  the  notion  of  commitment,  i.e.  that  commitments 
correspond  to  formulas  whose  necessity  holds. 

It  is  possible  to  prove  that  there  is  a  one  to  one  correspondence  between  the 
constrained  extensions  of  a  default  theory  (D,W)  and  the  KDL  extensions  of  the 
associated  K-default  theory  (Ds,Ws). 


64 


Theorem.  Let  (D,W)  be  a  default  theory  and  (Ds.Ws)  its  associated  K-default 
theory.  Then,  (E,C)  is  a  constrained  extension  of  (D,W)  iff  there  is  a  KDL  extension 
F  of  (Ds.Ws)  such  that 

E  =  {  a  I  ae  L  and  ae  F  )  and  C  =  {  a  I  ae  L  and  n  ae  F  } . 

The  proof  of  this  theorem  can  be  done  by  making  use  of  the  quasi-inductive 
definitions  of  constrained  extension  and  of  KDL  extension.  Remember  that  L  is  the 
non  modal  part  of  the  language  L*.  hence  E  contains  all  the  non-modal  formulas  in  F 
and  C  contains  all  the  non-modal  formulas  a  such  that  □  a  is  in  F. 

We  have  seen  that  the  notion  of  KDL  extension  is  quite  similar  to  that  of  Reiter's 
extension  and  it  does  not  involve  recording  justifications  in  a  set  of  constraints,  as  in 
constrained  default  logic.  To  obtain  the  equivalence  result  above  in  the  modal 
translation  of  the  default  theory  (D.W)  an  appropriate  use  of  modal  operators  in  default 
rules  has  been  required  to  allow  justifications  and  consequents  to  be  recorded:  the 
modal  formulas  in  the  KDL  extension  play  the  role  of  the  constraint  C  in  the 
constrained  extension. 

Notice  that  if  a  default  A:B/C  is  in  D,  then  Ds  contains  a  corresponding  default 
whose  justification  O(BaC)  explicitly  contains  C.  This  is  due  to  the  fact  that  in 
constrained  default  logic  default  rules  are  implicitly  regarded  as  seminormal  (i.e.  of  the 
fonn  A;BaC/C)  and  this  must  be  made  explicit  in  the  modal  translation. 

A  similar  modal  interpretation  within  KDL  can  be  given  to  CDL,  the  cumulative 
variant  of  DL  proposed  by  Brewka  {Brewka91].  In  fact,  as  already  mentioned  above, 
there  are  precise  results  of  equivalence  between  constrained  default  logic  and  CDL 
[Schaub92].  Since  CDL  allows  assertions  (i.e.  formulas  with  a  support)  in  the 
language,  in  order  to  give  a  modal  interpretation  to  default  theories  in  CDL,  it  suffices 
to  find  a  suitable  modal  translation  for  assertions.  The  modal  hanslation  of  default 
rules  in  CDL,  instead,  is  the  same  as  the  one  given  above  for  consu'ained  default 
logic.  See  [Giordano92]  for  details. 

4.  Mapping  Lukaszewicz*  default  logic  to  KDL 

The  alternative  formalization  of  default  logic  proposed  in  [LucaszewiczSS]  is 
motivated  by  the  need  of  having  a  system  in  which  existence  of  extensions  and 
semimonotonicity  (i.e.  monotonicity  with  respect  to  default  rules)  are  guaranteed. 
This  variant  employs  the  following  modified  criterion  of  default  applicability 
[LucaszewiczSS]:  "If  the  prerequisite  of  a  default  is  believed  (its  justification  is 
consistent  with  what  is  believed),  and  adding  its  consequent  to  the  set  of  beliefs 
neither  leads  to  inconsistency  nor  contradicts  the  justification  of  this  or  any  other 
already  ^plied  default,  then  the  consequent  of  the  default  is  to  be  believed". 

Lucaszewicz  defines  the  notion  of  m-extension  (modified  extension)  of  a  default 
theory  essentially  as  a  pair  (E,J),  in  which  E  is  concerned  with  beliefs  derivable  from 


65 


the  theory,  while  J  is  used  to  keep  track  of  justifications  supporting  those  beliefs  (the 
approach  is  similar  to  the  one  followed  by  Schaub).  m-extensions  can  be  defined  in 
the  following  way  (see  (LucaszewiczSSJ). 

Definition  (m-extension).  Let  (D,W)  be  a  default  theory.  Define  Ko=W,  Jo~0 
and,  for  all  i>0, 

Ei+i=  Th(Ei)  u  {y  I  (a  :  P  /  Y)eD,  aeEj  and, 

for  each  T\€Ju{P)  Eu{ti,y)  is  consistent) 

Ji+i=JiU{P  l(a:p/Y)eD,  aeEj  and, 

for  each  tje  Ju{p},  Eu{Ti,Yj  is  consistent). 

(E,J)  is  a  m-extension  of  {D,W)  iff  (E,J)  =  ( E; ,  Ui=o,(o  Ji  )■ 

Notice  that  in  m-extensions  J  is  not  deductively  closed,  differently  from  the 
context  C  in  constrained  extensions.  Moreover,  in  J  only  justifications  of  applied 
defaults  are  recorded  and  not  consequents,  and  justifications  of  applied  defaults  are  not 
required  to  be  consistent  altogether;  they  only  must  be  individually  consistent  with 
each  consequent  of  applied  defaults.  Since  consistency  between  justifications  is  not 
checked,  the  default  theory  of  Example  1,  (D,W)  with  D={:B/C  ,  :-i  B/  D  )  and 
W={ ),  has  a  single  default  extension  (Th({C,D)),  {B,-i  B)),  which  corresponds  to  the 
single  Reiter  extension.  To  see  the  difference  between  Lukaszewicz’  variant  and  default 
logic,  consider  the  following  enlarged  default  theory. 

Example  2. 

D={;B/C,  :-,B/D,  D  a--,  C  /  E),  W=:{). 

(D,W)  has  two  m-extensions  (Th({C,D)),  {B,-i  B)),  and  (Th({E)),  {-.  Da-t  C)). 
However,  (D,W)  has  a  single  Reiter  extension  Th({C,D)).  Notice  that  this  theory  has 
three  constrained  extensions;  (Th({C)),  Th({C,B))),  (Th({D)),  Th({D,  B)))  and 
(Th({E)),  Th({E,^D,-,C))). 

We  will  now  define  an  inieipretation  of  Lukaszewicz’  default  logic  in  KDL. 
Definition  (modal  interpretation  for  Lukasiewicz'  default  logic).  Let 
(D,W)  be  a  default  theory.  An  associated  K-defauli  theory  (Dl,Wl)  can  be  defined  as 
follows: 

WL  =  WuaW  and 

Dl=  {  A:OBaDC  /(Ca  □  CaO  B)  I  (A:B/C)  £ D  ). 

As  a  difference  with  previous  mappings,  in  this  case  O  B  is  put  in  the  con.sequent 
of  default  rules  instead  of  □  B.  This  is  because  in  this  case  there  is  no  commitment  to 
assumptions.  Hence,  defaults  with  inconsistent  justifications  like  O  B  and  O  B  can 
be  applied  together.  Notice  also  that  □  C  in  the  justification  of  default  rulc.s  in  Dl  is 
needed  to  guarantee  that  C  is  consistent  with  all  justifications.  Consider  again 
Example  1 .  Its  interpretation  in  KDL  is  the  following 

Dl={:OB  a  dC  /(Ca  □  CaO  B)  ,  :0-.  B  a  QD  /(Da  □  DaO  B)  ) 
Wl={). 


(Dl.Wl)  has  a  single  KDL  extension 

E=Thic(lC,  □  C.  O  B,  D,  □  D.  O  B  }). 

corresponding  to  the  single  m-extension  (Th({C,D)),  {B,-i  B)).  Notice  that  R 
contains  both  O  B  and  O  -i  B,  and  it  is  K-consistent. 

Also  for  Lukaszewicz'  variant  it  is  possible  to  prove  that  there  is  a  one  to  one 
correspondence  between  the  m-extensions  of  a  default  theory  (D,W)  and  the  KDL 
extensions  of  the  associated  K-default  theory  (Dl.Wl). 

Theorem.  Let  (D,W)  be  a  default  theory  and  (Dl,Wl)  its  associated  K-default 
theory.  Then,  (EJ)  is  a  constrained  extension  of  (D,W)  iff  there  is  a  KDL  extension  F 
of  (Dl,Wl)  such  that 

E  =  {  a  I  ae  L  and  ae  F  )  and 

J  =  {  a  I  ae  L,  O  ae  F  and  O  a  occurs  in  the  justification  of  some  de  GD(F)) , 
where  GD(F)=  { (A;B/C)e  Dl  I  Ae  F  and  -i  Be  F  }is  the  set  of  generating  defaults  of 
Fin  (Dl.Wl). 

Hence,  J  contains  all  a  such  that  O  a  is  in  F  and  it  occurs  in  the  justification  of  a 
generating  default  d  of  F. 

As  regards  Reiter's  default  logic,  we  have  mentioned  in  section  2  that  mere  is  a 
straightforward  mapping  of  default  theories  to  KDL  which  preserves  Reiter’s 
extensions.  It  is  the  identity  mapping,  which  maps  a  theory  (D,W)  to  a  theory 
(Dr,Wr)=(D,W).  Note,  however,  that  if  we  modify  the  modal  interpretation  given 
above  for  Lukaszewicz'  default  logic,  by  cancelling  □  C  from  the  justification  of 
default  rules  in  Dl,  we  get  the  following  alternative  modal  interpretation  for 
Reiter's  DL: 

WR  =  WunW  and 

Dr=  {  A:OB  /(Ca  □  CaO  B)  I  (A:B/C)  €  D  }. 

As  for  Lukaszewicz'  variant,  also  in  this  case  there  is  no  commitment  to 
assumptions;  hence  O  B,  and  not  □  B,  is  put  in  the  consequent  of  default  rules. 
Differently  from  it,  however,  □  C  is  not  included  in  the  justification.  In  fact  in 
Reiter's  default  logic  inconsistent  justifications  are  allowed  and  the  consistency  of 
default  consequent  is  not  required  in  order  to  apply  a  default. 

Also  for  this  second  modal  interpretation  of  Reiter's  default  theory  it  is  possible  to 
prove  that  there  is  a  one  to  one  correspondence  between  the  Reiter's  extensions  of  a 
default  theory  (D,W)  and  the  KDL  extensions  of  the  associated  K-default  theory 
(Or.Wr). 

5.  Commitment  to  assumptions 

The  modal  interpretations  proposed  above  for  the  different  variants  of  default  logics 
mainly  differ  as  regards  the  interpretation  of  default  rules.  Let  us  summarize  the 


67 


different  translations.  A  default  rule  d=  A;B/C  can  be  translated  into  a  KDL  default 
rule  in  the  following  ways: 

ds=dB=  A:0(BaC)  /(Ca  □  Ca  □  B)  Constrained  default  logic  and  CDL, 
dL=  A:OBaOC  /(Ca  □  CaO  B)  Lukaszewicz'  variant, 

dR=  A:OB/(Ca  □  CaO  B)  Reiter's  default  logic. 

We  have  already  mentioned  that,  on  a  syntactic  ground,  the  presence  of  □  B  in  the 
consequence  of  the  translated  default  is  what  distinguish  the  logics  that  commit  to 
assumptions  (i.e.  constrained  default  logic  and  Brewka's  CDL)  form  those  that  do  not. 
Moreover,  unlike  DL,  constrained  default  logic  and  CDL  not  only  commit  to 
assumptions,  but  also  regard  default  rules  as  seminormal,  since  the  consequence  C  of 
r/also  occurs  in  the  justification  of  dg  and  de- 

What  kind  of  variant  is  obtained  by  committing  to  assumptions  without  regarding 
defaults  as  seminormal?  Consider  the  following  modal  interpretation  for  the  default  d: 
dcA=  A:OB  /(Ca  □  Ca  □  B). 

This  default  rule  allows  commitment  to  the  assumption  B,  since  □  B  is  introduced  in 
the  consequent,  while  the  consequent  C  is  not  checked  for  consistency. 

In  [Giordano&Martelli92]  a  cumulative  variant  of  default  logic,  called  CA-default 
logic  (for  commitment  to  assumptions  default  logic),  has  been  proposed  which  has 
precisely  this  modal  interpretations  for  default  rules.  This  variant  has  been  defined  in 
the  style  of  Brewka's  CDL  and,  like  CDL,  it  requires  joint  consistency  of 
justifications.  Unlike  them,  however,  it  does  not  consider  default  rules  as  being 
seminormal.  Hence,  it  happens  to  be  closer  to  Reiter's  default  logic  than  other 
cumulative  variants.  In  particular,  it  does  not  guarantee  existence  of  extensions  and  it 
is  not  semimonotonic.  As  such  it  allows  priorities  between  defaults  to  be  expressed, 
conhary  to  Brewka's  CDL  (see  [Brewka91]  section  3). 

6.  Conclusions 

In  this  paper  a  uniform  interpretation  of  different  variants  of  default  logics  has  been 
given  by  mapping  them  into  a  modal  default  logic  KDL.  KDL  has  the  same  definition 
as  Reiter's  default  logic,  but  it  is  based  on  an  underlying  modal  logic,  K,  instead  of 
classical  logic. 

As  mentioned  in  the  introduction,  a  uniform  semantical  framework  has  been 
introduced  by  Besnard  and  Schaub  [Besnard&Schaub92]  for  the  variants  of  default 
logics  considered  in  the  previous  sections.  This  semantics  is  defined  in  the  same  style 
as  Etherington's  default  logic  semantics  [Etherington87]  and  as  the  semantics  for  CDL 
in  [Schaub91a],  but  it  makes  use  of  Kripke  structures.  It  must  be  noted  that,  for  the 
different  default  logics,  there  is  a  precise  correspondence  between  their  Kripke 
semantics,  as  defined  by  Besnard  and  Schaub,  and  their  translation  within  K-default 
logic  given  in  the  previous  sections. 


68 


Acknowledgements 

I  want  to  thank  Alberto  Martelii  for  his  helpful  suggestions.  This  paper  has  been 

partially  supported  by  CEC,  in  the  context  of  the  Basic  Research  Action,  Medlar  II. 

References 

(Besnard&Schaub92]  P.Besnard  and  T.  Schaub,  Possible  Worlds  Semantics  for 
Default  Logic,  in  Proc.  Canadian  Conference  on  AI,  1992. 

[Bowen79]  K.  Bowen,  Model  Theory  for  Modal  Logics,  Synthese  Library,  Reidel, 
Dordrecht,  1979. 

[Brewka91]  G.  Brewka,  Cumulative  Default  Logic;  in  defense  of  nonmonotonic 
infetcace  Artificial  Intelligence  50;  183-205,  1991. 

[Delgrande&Jackson91]  J.  Delgrande  and  W.  Jackson,  Default  Logic  Revised.  In 
J. Allen,  R.  Pikes  and  E.  Sandewall,  eds.,  Proc.  KR’9I,  pp.  118-127,  Morgan 
Kaufmann,  1991. 

[Etherington87]  D.  Etherington,  A  Semantics  for  Default  Logic,  in  Proc  Int.  Joint 
Conf.  on  Artificial  Intelligence,  pp.495-498,  1987. 

[Hughes&Cresswell68]  G.E.  Hughes  and  M.J.  Cresswell,  An  Introduction  to  Modal 
Logic,  Methuen,  London,  1968. 

[LucaszewiczSS]  W.  Lucaszewicz,  Considerations  on  Default  Logic  -  an  Alternative 
Approach,  Computational  Intelligence,  4;  1-6,  1988. 

[Makinson89]  D.  Makinson,  General  Theory  of  Cumulative  Inference,  in  M. 
Reinfrank,  eds.,  Proc.  Int.  Workshop  on  Non-Monotonic  Reasoning,  vol.  346 
Lecture  Notes  in  Artificial  Intelligence,  pp.1-18.  Springer  Verlag,  1989. 

[Giordano92]  L.  Giordano,  Defining  Variants  of  Default  Logic;  a  Modal  Approach, 
Technical  Report,  University  di  Torino,  1992. 

[Giordano&Martelli92]  L.  Giordano  and  A.  Martelii,  On  Cumulative  Default  Logics, 
submitted. 

[Poole89]  D.  Poole,  What  the  Lottery  Paradox  Tells  us  about  Default  Reasoning,  in 
R.  Brachman,  H.  Levesque,  and  R.  Reiter,  eds.,  Proc.  KR'89,  pp.  333-340, 
Morgan  Kaufmann,  1989. 

[Reiter80]  R.  Reiter,  A  Logic  for  Default  Reasoning,  Artificial  Intelligence,  I3;81- 
132,  1980. 

[Schaub91a]  T.  Schaub,  Assertional  Default  Theories;  A  Semantical  View.  In 
J.Allen,  R.  Pikes  and  E.  Sandewall,  eds.,  Proc.  KR'9I,  pp.  496-506,  1991. 

[Schaub91b]  T.  Schaub,  On  Commitment  and  cumulativity  in  Default  Logics,  in  R. 
Kruse,  ed.,  Proc.  European  Conference  on  Symbolic  and  Quantitative 
Approaches  to  Uncertainty,  pp.304-309.  Springer,  1991. 

[Schaub92]  T.  Schaub,  On  Constrained  Default  Theories,  in  Proc.  lOth  European 
Conference  on  Artificial  Intelligence,  pp.304-308,  Vienna,  August,  1992. 


An  Admissible  Heuristic  Search  Algorithm 


Li- Yen  Shue 
Reza  Zamani 

Dept  of  Business  Systems  Uni  of  Wollongong 
Australia 


Abstract.  This  paper  introduces  an  admissible  heuristic  search 
algorithm  -  Search  and  Learning  Algorithm  (SLA*).  SLA*  is 
developed  from  the  work  presented  by  Korf  in  the  Leaming-Real- 
Time-Algorithm  (LRTA*).  We  retain  the  major  elements  of  Korfs 
woik  in  LRTA*,  and  improve  its  performance  by  incoq)orating  a 
review  component  to  fiiUy  reflect  the  effect  the  learning  of  new 
heuristic  from  front  states  has  upon  the  previous  states.  The 
combined  strategy  of  search,  learning,  and  review  has  enabled  this 
algorithm  to  accumulate  knowledge  continuously  through  guided 
expansion,  and  to  identify  better  search  directions  in  any  stage  of 
nodes  expansion.  With  the  assumption  of  non-overestimating  initial 
estimates  for  all  nodes  to  the  goal,  this  algorithm  is  able  to  And  an 
optimal  solution  in  a  single  problem  solving  trial  with  good 
efficiency.  We  provide  a  proof  for  the  optimality  of  the  solution. 


1  Introduction 

Among  the  optimal  heuristic  search  algorithms  for  graph  problems,  the  most  well 
known  ones  are  A*[l]  and  IDA*[2],  Iterative-Deepening- A*.  A*  is  a  best-first 
search  algorithm,  where  the  heuristic  function  of  a  node,  fin),  is  the  sum  of  the 
actual  cost  in  reaching  that  node  from  the  root  state,  g(n),  and  the  estimated  cost  of 
reaching  the  goal  state  from  that  node,  h(n).  As  soon  as  a  node  is  selected  for 
expansion,  this  algorithm  adds  all  its  succeeding  nodes  to  the  selection  list.  In  any 
stage  of  state  selection,  all  nodes  in  the  list  have  to  be  considered.  The  node  witli 
the  minimum  heuristic  estimate  is  selected  for  expansion.  One  of  the  immediate 
drawbacks  of  this  algorithm  is  the  exponential  growth  of  memory  space 
requirement.  The  IDA*  was  designed  with  the  intension  to  reduce  the  space 
complexity  of  the  A*.  Starting  with  the  estimated  initial  threshold  for  the  root 
state,  this  algorithm  tries  to  find  the  next  threshold  by  performing  a  series  of  depth- 
first  searches.  The  minimum  estimated  value,  f(n)=g(n>+h(n),  of  an  iteration  that 
exceeds  the  current  threshold  becomes  the  new  threshold  for  the  next  iteration. 
With  the  assumption  of  non-overestimating  initial  estimates  and  positive  edge 
costs  between  nodes,  this  algorithm  will  have  its  threshold  increased  in  each 


70 


iteration,  and  reach  the  optimal  solution  in  the  end  The  nature  of  this  algorithm  in 
focusing  on  finding  the  next  threshold  level,  with  no  need  to  remember  those  nodes 
to  be  visited  next  time,  does  lead  to  the  reduction  of  space  complexity  fi-om 
exponential  to  linear.  However,  the  repetitive  search  for  the  next  threshold  from 
the  root  node  will  lead  to  the  same  drawback  as  the  A*  algorithm  in  requiring 
exponential  time  to  run  in  practice. 

Another  optimal  heuristic  search  algorithm  is  the  LRTA*[3],  Learning  Real  Time 
Algorithm.  This  algorithm  differs  from  the  previous  two  methods  in  that  this 
algorithm  adapts  a  limited  search  horizon  before  making  a  decision  move  and  the 
heuristic  estimate  to  the  goal  of  the  visited  node  may  be  improved  as  search 
continues.  The  search  horizon  consists  of  all  neighbouring  nodes  of  a  front  node. 
The  justification  in  improving  the  heuristic  estimate  of  a  node  is  based  on  the  fact 
that  a  node's  heuristic  estimate  to  the  goal  must  be  at  least  as  large  as  the  minimum 
of  its  neighbours'.  With  the  assumption  of  non-overestimating  initial  heuristic 
estimates  and  positive  edge  costs  between  nodes,  the  repetitive  applications  of  the 
problem  solver  will  leaa  to  the  optimal  solution  as  the  effect  of  the  edge  costs 
finally  prevail.  This  algorithm  presents  the  obvious  advantages  in  both  space 
complexity  and  time  complexity  over  the  previous  two  algorithms,  although  there 
is  no  guarantee  of  optimal  solution  in  any  single  solution  trial,  nor  is  there  any 
indication  of  how  many  solving  trials  are  needed  to  reached  an  optimal  solution.. 

In  this  paper,  we  introduce  an  admissible  heuristic  search  algorithm  -  Search  and 
Learning  Algorithm  (SLA*).  This  algorithm  utilises  heuristic  as  a  search  vehicle  as 
others  and  learns  from  the  comparison  of  heuristic  estimates  of  a  node's  neighbours 
as  the  LRTA*.  With  the  introduction  of  a  review  component  and  the  application  of 
the  combined  strategy  of  search,  learning,  and  review,  this  algorithm  is  able  to 
search  for  a  new  front  state,  and  review  the  vaUdity  of  its  previous  states  and  their 
respective  heuristic  estimates  if  a  learning  has  occurred.  As  a  result,  this  algorithm 
is  able  to  maintain  its  states  and  their  heuristic  values  up-to-date  to  account  for  full 
effect  of  heuristic  learning  during  the  search  process,  and  find  an  optimal  solution 
in  a  single  problem  solving  trial. 

2  Search  and  Learning  Algorithm 

SLA*"  algorithm  works  with  the  usual  assumption  that  initial  heuristic  estimate  of 
every  state  to  the  goal  is  a  lower  bound  on  its  actual  value.  At  a  front  state,  to 
search  for  the  next  state  for  expansion,  it  compares  the  heuristic  values  of  its 
neighbouring  states  and  pick  the  one  with  the  minimum  value,  as  does  the  LRTA*. 
The  same  rationale  used  in  LRTA**  is  also  adapted  in  determining  if  a  front  state's 
heuristic  estimate  can  be  improved  and  by  how  much.  In  case,  the  indication  is 
that  an  improvement  can  be  made,  the  review  component  will  be  invoked 
immediately  after  the  adjustment  is  made.  The  front  state  indicator  of  the 
algorithm  will  then  point  to  its  immediate  previous  state  as  the  new  front  state,  and 
a  fresh  evaluation  is  carried  out  to  see  if  the  newly  revised  heuristic  value  of  its 


71 


neighbour  can  help  improve  its  own  heuristic  estimate  to  the  goal  and  by  how 
much.  The  same  process  will  continue  to  review  the  previously  selected  states  one 
by  one  in  the  reverse  order,  and  stop  at  the  first  state  whose  heuristic  estimate 
remains  unchanged  after  the  re-evaluation.  Then,  fiom  this  front  state,  the  search 
part  of  the  algorithm  resumes.  As  a  result,  every  time  when  a  backtracking  occurs, 
states  with  their  heuristic  values  modified  will  be  detached  from  the  original  path, 
and  their  new  values  will  be  used  for  the  re-examination  of  their  previous  states. 
When  the  search  part  finally  resumes,  the  algorithm  may  or  may  not  choose  the 
same  states  again,  because  some  states'  heuristic  has  been  altered,  and  depending 
on  the  heuristic  estimates  of  other  states,  the  algorithm  may  choose  to  explore  new 
search  direction. 

With  k(x,y)  representing  the  positive  edge  cost  from  state  x  to  a  neighbouring  state 
y,  this  algorithm  can  be  implemented  in  the  following  details  : 

step  0  :  Apply  a  heuristic  function  to  generate  non-overestimating  initial  heuristic 
estimate  h(x)  for  every  state  x  to  the  goal  state,  and  continue. 

step  1  ;  Put  the  root  state  on  the  backtrack  list  called  OPEN,  and  continue. 

step  2  :  Call  top-most  state  on  the  OPEN  list  x.  If  x  is  the  goal  state,  stop; 
otherwise  continue. 

step  3  :  If  X  is  a  dead-end  state,  replace  its  h(x)  with  a  very  large  value  ,  remove  x 
from  OPEN  list,  and  go  back  to  step  2;  otherwise  continue. 

step  4  ;  Evaluate  k(x,y)+h(y)  for  all  neighbouring  state  y  of  x,  and  find  the  state 
with  the  minimum  value;  break  ties  randomly.  Call  this  stale  x',  and 
continue. 

Step  5  :  If  h(x)  >=  k(x.x')+h(x'),  then  add  x'  to  the  OPEN  list  as  the  top-most  state 
and  go  back  to  step  2;  otherwise  continue. 

step  6  :  Replace  h(x)  with  k(x,x')+h(x'),  and  continue. 

step  7  :  If  X  is  not  the  root  state,  remove  x  from  OPEN  list,  and  continue;  otherwise 
continue. 

step  8  :  go  to  step  2. 

Step  3  of  the  algorithm  was  designed  to  take  care  of  problems  with  dead-end 
states,  where  the  goal  is  not  yet  found  and  no  further  expansion  is  possible.  By 
assigning  a  large  value  to  the  heuristic  estimate  of  a  dead-end  state  will  ensure  no 
future  visit  to  the  same  state. 


72 


To  demonstrate  the  operation  of  this  algorithm,  we  use  the  graph  in  Figure  1.  In 
the  graph,  a  state  is  represented  by  an  alphbet,  and  the  initial  heuristic  estimate  to 
the  goal  from  a  state  is  given  by  the  corresponding  number;  the  first  number  in 
case  of  states  with  multiple  numbers.  The  multiple  numbers  of  a  state  represent 
the  updated  estimates  through  the  repeated  review  and  updating  operation.  To 
simplify  the  calculation,  we  asf'  me  the  edge  cost  to  be  1  for  all  stales.  Assuming 
the  current  front  state  is  state  c,  tli  .*  detailed  partial  operation  is  given  below: 


Fig.  1  Graph  with  initia!  &  updated  heuristic  estimates  returned  by  SLA* 

At  c,  where  h(c)  =  12,  min{ [h(a)+I].  fh(b)+l],  [h(d)+l])  =  1 14,  15,  9}  =  9.  Since 

9  <  h(c),  state  d  is  selected  as  the  new  n  ,nt  state. 

At  d,  where  h(d)  =  8,  min([h(c)+l],  (h(e)+l],  [h(0+l]|  =  j  13.  9,  1 1 )  =  9,  Since  9 
>  h(c),  h(c)  is  updated  to  9,  and  state  c  becomes  the  new  front  state. 

At  c,  where  h(c)  =  12,  min|[h(a)+I],  [h(b)+l],  [h(d)rl]}  =  (14,  15.  10|  =  10. 
Since  10  <  h(c),  state  d  is  selected  as  the  new  front  slate. 

At  d,  where  h(d)  =  9,  min|(h(c)+l  ],  [h(e)+l  j,  [h(0+l ))  =  1 13,  9,  11}  =9.  Since  9 
=  h(c),  state  e  is  selected  as  the  new  front  state. 

At  e,  where  h(e)  =  8,  min|  [h(d)+l  ].  [h(g)+l ),  (h(i>+ll|  =  1 10,  1 1,  1 1 }  =  10  Since 

10  >  h(e),  h(e)  is  updated  to  10,  and  state  d  becomes  the  new  front  state 

At  d,  where  h(d)  =  9,  min( [h(c)+l ).  [h(c)+l),  (hfO+Ilt  =  (13,  11,  11}  =  11 
Since  1 1  >  h(d),  h(d)  is  updated  to  1 1 ,  and  state  c  becomes  the  new  front  state. 

Ai  this  stage,  the  heuristic  estimate  of  .state  d  has  been  updated  twice  and  once  for 
st?te  e,  they  appear  to  be  mote  reasonable  than  before.  As  a  result,  there  is  no 
evidence  to  support  any  fur  her  improvement  among  slates  a,  b.  c,  d,  e,  and  f,  any 


73 


further  improvements  have  to  be  triggered  by  updating  of  other  states.  Should 
LRTA*  be  applied  to  the  same  problem,  its  first  solving  trial  would  have  impoved 
state  d  to  9,  and  moved  to  state  c  to  improve  its  value  to  10,  and  then  continue  with 
states  after  that.  It  would  need  at  least  another  solving  trial  from  the  root  state  to 
improve  state  d  to  1 1. 

3  Theorem  and  Proof 

Theorem.  For  a  finite  problem  space  with  positive  edge  cost  and  non- 
overestimating  initial  heuristic  values,  in  which  a  goal  state  is  reachable  from  the 
root  state,  the  application  of  SLA*  will  find  an  minimum  path. 

Proof.  Let  P  =  X(l),  X(2), ...  X(o)  denote  the  root-to-goal  path  returned  by  SLA*. 
Let  P'  =  Y(l),  Y(2),  ...  Y(t)  be  any  other  root-to-goal  path.  Let  X(l)  =  Y(l)  be  the 
root  state  of  a  problem. 

Let  Y(m)  be  the  first  state  of  P'  which  is  not  on  P.  Thus  both  X(m)  and  Y(m)  are 
neighbours  of  their  previous  common  state  X(m-l). 

As  indicated  by  step  5  and  6  of  the  algorithm,  the  following  relation  is  always  true 
for  a  state  r  under  SLA*. 

H|X(r)}  >=K|X(r),X(r+l)l  +H{X(r+l))  (1) 

This  equation  can  be  rearranged  as 

H{X(r)}  -  H{X(r+l)|>=  K(X(r),X(r+l)l  (2) 

By  expanding  and  summing  both  sides  of  equation  (2)  over  the  state  space  of  path 
P  from  m  to  the  goal  state,  the  following  relation  is  obtained. 

H|X(m))  -H{X(n)}  >=  K{X(m),X(m+l))  +K{X(m+l),X(m+2)j  +  ... 

+  K|X(n-l),X(n))  (3) 

With  the  estimation  from  the  goal  state  to  itself  being  0,  H{X(n))  =  0,  equation  (3) 
is  simplified  to 

H{X(m))  >=KjX(m),X(m+l)l  +K(X(m+l),X(m+2)}  +...  +K|X(n-l),X(n)}  (4) 

The  fact  that  SLA*,  at  state  X(m-l),  has  preferred  X(m)  to  Y(m),  as  indicated  by 
step  4,  has  also  led  to  the  following  relation. 


(S) 


K|X(m-l),Y(m))  +H|Y(m))  >=  K|X(m-l),X(m))  +H|X(m)} 


74 


By  substituting  H{X(in)|  of  relation  (5)  with  the  right  hand  side  of  relatiu.^  (4), 
relation  (S)  can  be  expressed  as  following; 

K{X(m-l),Y(m)|  +H{Y(m)|  >=K(X(m-l),X(ni)|  +  K{X(m),X(m+l)| 

+  K|X(m+i),X(ni+2)l  +  ...  +  K{X(n-l),X(n)|  (6) 

It  is  obvious  from  this  relation  that  by  deviating  from  P  at  any  state  X(m-l),  the 
estimate  to  the  goal  of  the  remaining  path  will  always  be  greater  than  or  equal  to 
the  true  value  of  the  corresponding  path  of  P.  With  the  non-overestimating  nature 
of  updating  a  state's  heuristic  in  the  algorithm,  the  true  value  of  the  former  might 
even  be  much  greater  than  the  later.  Thus,  the  application  of  SLA*  will  find  an 
optimal  solution. 

4  Efficiency  of  SLA* 

The  fact  that  this  algorithm  needs  to  store  the  heuristic  estimate  for  every  state  and 
remember  the  states  on  the  path  at  any  time  has  led  to  the  upper  bound  of  space 
requirement  as  n  plus  the  number  of  states  on  the  path,  where  n  is  the  total  number 
of  states  of  a  problem.  The  number  of  states  on  the  list  is  usually  only  a  small 
portion  of  n.  In  practice,  however,  the  actual  memory  requirement  could  be  lower, 
because  usually  there  exists  a  function  which  computes  the  original  heuristic 
estimates,  and  it  is  only  rrecessary  to  store  in  memory  those  values  which  differ 
from  those  computed. 

To  derive  time  efficiency  for  the  SLA*,  we  assume  that  all  numbers  used  in 
solving  a  problem  are  positive  integers.  With  'his  assumption,  the  worst  indictive 
case  for  this  algorithm  is  n*s,  where  s  is  the  returned  actual  cost  from  the  root  state 
to  the  goal  state.  This  worst  case  may  happen  when  the  initial  heuristic  estimates  to 
the  goal  for  all  states  are  zero,  no  information  for  the  algorithm  to  leam  initially, 
and  all  edge  costs  from  one  state  to  its  ireighbours  are  assumed  to  be  one.  The  later 
assumption  will  lead  to  only  oik  unit  improvement  in  each  updating  process,  and 
the  n*s  figure  is  the  worst  case  when  every  state  has  to  be  visited  s  times.  In 
reality,  the  actual  worst  case  will  only  be  a  portion  of  this  figure  n*s,  because  all 
states  except  the  root  state  will  have  a  smaller  actual  cost  to  the  goal  state  than  s, 
they  do  not  ireed  to  be  visited  as  many  times.  For  most  average  problems,  where 
initial  information  for  heuristic  estimation  is  obtainable,  and  edge  costs  may  vary 
from  one  state  to  another,  the  number  of  state  visits  required  before  an  optimal 
path  is  found  could  be  greatly  reduced. 

We  have  experimented  both  LRTA*  and  SLA*  algorithms  with  square  grid 
problems,  where  a  state  is  represented  by  a  cell,  and  a  path  from  the  root  to  the 
goal  is  represented  by  the  coniKction  of  chosen  cells.  The  borders  between  cells 
serve  to  represent  edge  costs.  To  simulate  problems  where  some  states  are  dead¬ 
ends,  the  corresponding  borders  can  be  setup  as  barriers  to  bar  from  crossing. 
V/ith  5  different  square  sizes  10,  15,  20,  25,  and  30,  and  4  percentages  of  randomly 


75 


assigned  barriers  15%,  25%,  35%,  and  45%,  a  total  of  20  problems  were  tested. 
The  SLA*  found  all  optimal  solutions  in  a  single  problem  solving  trial,  while 
LRTA*  required  more  than  one  solving  trial  and  as  a  result  more  state  visits  as 
expected.  The  ratios  of  the  number  of  state  visits  to  find  the  optimal  solutions 
between  the  two  algorithms  range  from  3.7  to  28.7.  The  trend  is  clear  that  the 
larger  the  size  of  a  problem  and  the  barrier  percentage  the  larger  this  ratio 
becomes. 

5.  Conclusion 

The  proposed  SLA*  algorithm  improves  greatly  the  efficiency  of  the  original 
LRTA*  by  incorporating  a  review  component,  which  is  to  reflect  fully  the  impact 
the  heuristic  estimate  modification  of  a  front  state  has  upon  the  previously  selected 
states  and  their  heuristic  values.  The  search  is  guided  by  the  continuous 
accumulation  of  changes  of  heuristic  values  of  visited  states  and  their  effects. 
Along  the  search  process,  this  algorithm  keeps  tracking  the  ’best"  states  at  any 
moment.  Due  to  the  continuous  effect  of  heuristic  updating  and  the  associated 
backtracking,  the  "best"  states  may  be  changing  from  time  to  time.  Hence,  a  state 
on  the  final  path  may  need  to  be  re-visited  as  many  times  as  necessary  to  gradually 
adjust  its  heuristic  value  to  its  actual  one.  In  addition,  depending  on  the  initial 
heuristic  estimates,  states  not  on  the  final  path  may  or  may  not  be  visited.  Even  if 
they  are  visited,  they  may  not  be  visited  to  the  entent  of  revealing  their  actual 
values,  this  represents  a  great  advantage  of  SLA*  over  LRTA*. 

In  comparison,  the  fundamental  difference  between  SLA*  and  LRTA*  lies  in  the 
ability  of  SLA*  to  review  its  selected  states  and  their  heuristic  values  and  make 
appropriate  adjustments  whenever  a  learning  is  encountered  in  changing  the 
heuristic  estimate  of  a  front  state.  LRTA*  does  make  adjustments  to  front  states, 
however,  it  has  no  capability  to  review  the  appropriteness  of  its  previous  states.  In 
effect,  LRTA*  uses  repetitive  problem  solving  trials  to  carry  out  the  essence  of  the 
review  work,  which  is  certainly  not  a  very  efficient  way.  Another  difference  is  the 
fact  that  SLA*  is  applicable  to  problems  where  there  exists  at  least  one  path  from 
the  root  state  to  the  goal  state,  it  is  not  necessary  to  require  every  state  to  have  a 
path  to  the  goal  as  is  requited  by  the  LRTA*. 

References 

1.  P.E.  Hart  N.J.Nilsson  and  B.  Raphael  "A  formal  basis  for  the  heuristic 
determination  of  minimum  cost  paths",  IEEE  Trans.  Syst.  Sci.  Cybem.  4  pp 
100-107,  1968. 

2.  R.E.  Korf,  "Depth  First  Iterative  Deepening:  An  Optimum  Admissible  Tree 
Search",  Journal  of  Artificial  Intelligence,  27,  pp.  97-100,  1985. 

3.  R.E.  Korf,  "Real  Time  Heuristic  Search",  Journal  Of  Artificial  Intelligence"  Vol 

42,  No  2-3,  March,  pp.  189-211,  1990. 


Building  an  Expert  System  Language  Interpreter 
with  the  Rule  Network  Technique* 


Shie-Juo  Lee  and  Chih-Hung  Wu 

Department  of  Electrical  Engineering,  National  Sun  Yat-Sen  University 
Kaohsiung,  Taiwan  80424,  e-mail:  leesj,jolinw®ee.nsysu.edu.tw 


Abstract.  Expert  systems  are  increasingly  prevailing  in  the  fields  of  in¬ 
dustry,  business,  and  defense  affairs.  Expert  system  languages  are  key 
to  the  development,  and  play  a  decisive  role  on  the  quality,  of  expert 
systems.  Three  major  components  are  required  in  an  expert  system  lan¬ 
guage:  knowledge  representation,  control,  and  developing  tools.  Current 
major  expert  system  languages  have  their  individual  pros  and  cons,  in 
terms  of  each  component.  VVe  are  developing  a  new  expert  system  lan¬ 
guage  which  combines  the  advantages  of  these  languages.  Knowledge 
are  expressed  in  the  form  of  facts  and  rules  which  consist  of  predicates, 
without  requiring  rules  to  be  Horn  clauses.  Facts  may  include  variables 
and  patterns  are  matched  by  unification.  Control  of  execution  is  char¬ 
acterized  by  the  recognize-act  cycle  of  forward-chaining  to  reason  about 
and  answer  user  questions.  Rules  can  be  added  and  deleted  dynamically. 
Friendly  debugging  facilities  are  provided.  An  interpreter,  using  the  rule 
network  technique,  for  the  new  language  has  been  constructed,  and  lest 
results  show  that  the  language  is  effective. 


1  Introduction 

Expert  systems  are  increasingly  prevailing  in  the  fields  of  industry,  business,  and 
defense  affairs.  A  good  expert  system  language  may  facilitate  and  speed  up  the 
construction,  and  may  improve  the  quality,  of  expert  systems.  Therefore,  the 
importance  of  developing  good  expert  system  languages  cannot  be  overempha¬ 
sized. 

Three  major  components  are  required  in  an  expert  system  language:  knowl¬ 
edge  representation,  control,  and  developing  tools.  Current  major  expert  sys¬ 
tem  languages  .such  as  LISP,  Prolog,  OPS5,  and  CLIPS,  have  their  individual 
pros  and  cons,  in  terms  of  each  component.  We  are  developing  a  new  expert 
system  language  and  we  try  to  combine  the  advantages  of  these  languages  men¬ 
tioned  above  and  avoid  their  disadvantages.  Knowledge  are  expresseil  in  the 
form  of  facts  and  rules  which  consist  of  predicates,  without  requiring  rules  to  be 
Horn  clauses.  Facts  may  include  variables  and  patterns  are  matched  by  unifica¬ 
tion.  Control  of  execution  is  characterized  by  the  recogi\ize-act  cycle  of  forward- 
chaining  to  reason  about  ainl  answer  us<tr  questions.  Rules  can  be  added  and 

Supported  by  Natioii.il  Science  (’ouiicil  under  ftrant  NSC  81-04()8-E-l  10-02. 


77 


deleted  dynamically.  V'arious  debugging  commands  are  provided  to  help  pro¬ 
grammers  to  investigate  the  behavior  of  their  programs  throughout  the  develop¬ 
ment  process. 

An  interpreter  for  the  new  expert  system  language  has  been  constructed. 
It  applies  the  rule  network  technique  and  was  written  in  C-Prolog  [5].  Some 
problems  have  been  tested  and  the  result  shows  that  the  new  language  is  effective. 


2  Features  of  the  New  Expert  System  Language 

The  new  language  has  the  following  features: 

1.  Each  rule  is  expressed  in  the  form,  LHS  =>  which  is  the  same  Jis  in  CL1PS[1]. 

2.  A  LHS  or  a  RHS  is  a  sequence  of  predicates,  using  commas  or  explicit  ANDs  to 
concatenate  them  together.  A  predicate  is  a  predicate  symbol  followed  by  a  number 
of  terms.  Definitions  for  predicates  and  terms  are  the  same  as  those  in  first-order 
logic  [2].  For  example,  falh€r(john,viary)  is  a  predicate  indicating  that  John  is 
Mary’s  father.  Each  predicate  in  a  LHS  is  called  a  pattern,  and  each  predicate  in 
a  RHS  is  called  an  action. 

3.  A  fact  is  a  predicate.  Variables  are  allowed  to  appear  in  a  fact. 

4.  A  variable  is  a  question  mark  (?)  followed  by  a  sequence  of  alphabetic  or  nu¬ 
meric  characters  in  which  the  first  character  is  alphabetic.  A  constant  (individual 
constant,  function  constant,  or  predicate  constant)  is  a  sequence  of  alphabetic 
(including  .)  or  numeric  characters. 

5.  Execution  is  controlled  by  forward-chaining  or  backward-chaining. 

6.  It  can  provide  answers  to  user  queries:  why  and  how.  That  is,  it  can  explain  why 
a  conclusion  is  obtained  and  why  a  certain  piece  of  information  is  needed  in  the 
process  of  inference. 

7.  Rules  are  allowed  to  be  added  or  deleted  dynamically  through  actions  in  the  right 
hand  side  of  a  rule. 

8.  The  impleincnlation  is  based  on  the  Rete  algorithm  with  some  modifications. 

9.  A  rich  set  of  del  ugging  commands  are  provided  for  programmers  to  develop  their 
programs.  Almost  all  the  debugging  commands  in  CLIPS  are  offered. 

Therefore,  the  following  are  legal  rules  for  the  language: 

-  f ather{'!z,ly),ancestor{ly,'f z)  =^ancestor{?x,  ?z). 

-  phase  {choose -player),  player-selecl{?x)  =>retract(phase{choo3e -player)), re  tract 
{playeT-select{?x)),  assert{phasc{choo3e -player)),  write{'^choose  c  or  h."). 

-  get(? LH S),  get(? RH S)  wrtte{'^ H avtng  learned  a  newrule."),  assert(  7 LHS 

=>?RHS). 

-  learii-phase  ==>  u'rt<e( “Type  in  a  rule  or  a  fact."),  Tead{?x),  assert(? x). 

Currently,  our  language  interpreter  provides  forward-chaining  only.  The  facilities 

for  an.swering  why  and  how  are  not  inipiemented  yet. 


3  The  Rule  Network 


111  an  expert  system,  matching  each  rule  individually  against  all  facts  in  each 
cycle  would  be  clearly  inefficient.  Forgy  [4]  proposed  the  Rete  algorithm  to  solve 


78 


this  inefficiency  problem,  in  which  rules  are  compiled  into  the  rule  network  which 
consists  of  the  pattern  network  and  the  joint  network.  The  algorithm  was  used 
in  many  expert  system  language  interpreters,  including  Of’S5  and  CLIPS.  In 
the  pattern  network,  facts  matched  to  patterns  are  stored  in  alpha  memories,  in 
the  joint  network,  sequences  of  consistent  facts  matched  to  patterns  are  stored 
in  beta  memories.  By  maintaining  the  rule  network,  duplicate  check  for  pattern 
matching  and  partial  matches  with  old  facts  is  avoided.  The  information  con¬ 
tained  in  the  rule  network  only  needs  to  be  recomputed  when  changes  occur  in 
the  working  memory.  For  example,  if  a  set  of  patterns  match  two  of  three  facts 
in  one  cycle,  a  check  for  pattern  matching  with  these  three  facts  won’t  be  done 
again  in  the  next  cycle. 


3.1  The  Pattern  Network 


The  process  of  determining  which  facts  have  matched  which  patterns  is  done 
in  the  pattern  network.  Tests  for  constant  and  relationship  matches  are  done  in 
pattern  nodes,  with  one  pattern  node  for  each  test.  Eacli  pattern  is  complied 
into  a  path  made  of  a  sequence  of  pattern  nodes  in  the  pattern  network,  with  an 
alpha  memory  attached  at  the  end  of  the  path.  Different  patterns  may  share  the 
same  pattern  nodes.  A  pattern  match  occurs  when  a  fact  has  satisfied  a  single 
pattern  in  any  rule  without  regard  to  variables  in  other  patterns.  That  is  to  say,  a 
pattern  match  corresponds  to  a  path  along  which  a  fact  has  traveled  successfully. 
When  a  fact  travels  successfully  along  a  path,  it  is  stored  in  the  alpha  memory 
attached  to  the  path.  For  example,  suppose  we  have  a  set  of  patterns  {(cor  ford), 
(car  ?x),  (car  ->fovd).  (bike  ?i/)}  wliere  ->  is  the  negation  symbol.  Two  pattern 
nodes  N\  and  are  created  for  the  first  pattern  (car  ford),  with  Ni  including 
the  following  test: 

the  first  field  ts  equal  to  the  constant  car. 
and  N2  including  the  following  test: 

the  second  field  ts  equal  to  the  constant  ford. 

Now  we  compile  the  pattern  (car  ?x)  and  find  that  Aq  can  be  shared.  The 
variable  ?x  matches  anything  in  the  .second  field  of  the  income  objects  which 
pass  through  Ni  successfully.  As  the  third  pattern  compiled,  which  shares  Aj 
with  (car  ford),  creates  a  pattern  node  A3  which  performs  the  following  test: 

the  second  field  ts  not  equal  to  the  constant  ford. 

Finally,  a  pattern  node  A4  is  created  for  the  pattern  (bike  ?y)  and  contains  the 
following  test: 

the  first  field  ts  equal  to  the  constant  bike. 

Suppose  we  have  four  facts  in  the  working  memory:  (car  ford),  (car  benz). 
(car  car),  and  (bike  ford),  denoted  by  fl,  fi,  f3,  and  f4  respectively.  Then  the 
content  of  the  alpha  memory  for  each  pattern  is  listed  below. 

For  pattern  (car  ford):  {/!} 

For  pattern  (car  ?t):  {fl,f'2,f.i]. 

For  pattern  (car  -'ford):  {fl.f2). 

-  For  pattern  (hike  ?v):  {/‘I}- 


79 


3.2  Tile  Join  Network 

Eacli  alpha  memory  node  in  the  pattern  network  acts  as  one  input  to  a  node 
in  the  join  network  where  comparison  of  variable  bindings  across  patterns  is 
performed  to  ensure  that  variables  have  consistent  values.  The  nodes  in  the  join 
network,  referred  as  jotn  nodes,  have  two  inputs  nodes:  oiie  from  some  alpha 
memory  node  and  another  from  some  join  node.  A  join  node  contains  matching 
tests  for  the  content  of  a  alpha  memory  and  the  set  of  partial  matches  that  have 
matched  previous  patterns.  A  partial  match  for  a  rule  is  any  set  of  facts  which 
satisfy  the  rule’s  patterns  beginning  with  the  first  pattern  of  the  rule  and  up 
to  the  underlying  pattern.  The  partial  matches  for  a  sequence  of  patterns  are 
stored  in  the  beta  memory  of  a  join  node.  The  first  join  node  performs  tests  for 
the  first  two  patterns  and  the  remaining  join  nodes  test  the  partial  matches  in 
the  beta  memory  of  a  previous  join  node  and  the  content  of  the  alpha  memory 
of  an  additional  pattern.  Partial  matches  are  stored  in  join  nodes  and  passed  to 
next  join  nodes  for  further  partial  matching.  For  example,  suppose  we  have  two 
rules  containing  the  following  LHSs: 

LHS  of  Rule-1:  LHS  of  Rule-2: 

pattern!:  (?x  allocation  ?y)  patternl:  (?x  allocation  '^y) 

pattern2:  (car  ?x)  pattern2:  (car  ?x) 

pattern3:  (bike  ?x)  pattern3:  (truck  ?x) 

Then  the  first  join  node  of  Rule-1  would  perform  a  test  like: 

The  value  of  the  1st  field  of  the  fact  bound  to  the  1st  pattern  ts  equal  to  the 
value  of  the  2nd  field  of  the  fact  bound  to  the  2nd  pattern. 

The  second  join  node  of  Rule-1  would  receive  as  input  the  set  of  partial  matches 
from  the  first  join  node  and  the  fact  that  matches  the  third  pattern,  and  contains 
the  test: 

The  value  of  the  2nd  field  of  the  fact  bound  to  the  3rd  pattern  is  equal  to  the 
value  of  the  2nd  field  of  the  fact  bound  to  the  2nd  pattern. 

Assuming  we  have  seven  facts  in  working  memory:  (car  ford),  (car  benz),  (car 
car),  (benz  benz),  (bike  benz),  (benz  allocation  germany),  and  (ford  allocation 
usa),  referred  as  fl,  f2,....,  f7  respectively.  Then  the  beta  memory  of  the  first  join 
node  of  Rule.l  contains  the  following  set  of  partial  matches; 

{{(ford  allocation  usa), (car  ford)}, {(benz  allocation  germany),(car  6eni)}} 
or 

{{f7,  fl},  {f6,  f2}}. 

And  the  beta  memory  of  the  second  join  node  of  Rule-1  contains  the  following 
set  of  partial  matches: 

{{(6enr  allocation  germany),(car  benz),  (bike  6en;)}} 
or 

{{f6,  f2,  f5}}. 

The  resulting  rule  network  is  shown  in  Figure  1. 

A  rule  satisfied  by  facts  also  has  a  partial  match  of  all  of  the  patterns  of  the 
rule.  Such  rules  are  called  activated  rules.  All  the  activated  rules  are  collected  in 
a  conflict  set,  to  which  the  last  join  node  of  rules  are  directed.  One  of  the  rules 


80 


ihe  pattern, 
network 


.  rule  network  ■ 

/  \ 


tta  iRfiold 


the  join  J 
network  | 


•Uoctfica 

Rphtii#/?! 

.4. 


I 

J|)h»:(nX!| 


tfafe  lit  Gebi 
ii  equAi  to 
cooMK  buck 

I 


tfat  value  of  the  lit  nold  of  the  fict  bound  to 
1ft  patten)  ii  equal  to  the  value  of  the  ^ul  Celd 
of  (he  fact  bound  to  tite  ^id  paitefn 

.ta.Ru.ofthc  2ndWdafthc  (ic<l^  »  /  a*  of  Ac  2»i  f«ld  o(U.  tM  tauad  u> 

/  tte3,dpR«.««,„RBUR«l«ofd«2ad 

Geld  of  IhofU  bound  to  the  2nd  puiera  /  Gold  of  the  tKt  h^  to  the  2nd 

Sheu:((16.^^  ni  heu:(, *.01.(041)) 

Role.I  Ruk_2 


ei 


bete: 

({*.t2.M).(n4143)) 


Fig.  1.  The  Rule  Network  of  the  rules  Rule.l  and  Ru!c.2 


in  the  conflict  set  is  selected  according  to  conflict  resolution  policies  and  then  is 
fired. 


4  Algorithms 

In  this  section,  we  describe  some  major  algorithms  used  in  the  construction  of  the 
interpreter  which  executes  expert  system  programs  written  in  the  new  language. 

We  use  C-Prolog  in  the  project.  Unification  for  the  matching  between  facts  and 
patterns  is  done  by  Prolog  itself.  Resides  unification,  the  interpreter  has  to  take 
care  of 

1.  Compiling  rules  into  the  rule  network; 

2.  Performing  the  recognize-act  cycle; 

3.  Interpreting  facts  to  match  patterns  in  the  rule  network; 

4.  Adding  new  facts/rules  to  the  rule  network  or  deleting  existing  facts/rules  from 
the  rule  network  dynamically; 

5.  Keeping  the  knowledge  contained  in  the  rule  network  consistent  after  addition/deletion 
of  facts  or  rules; 

6.  Maintaining  activated  rules  in  the  conflict  set  correctly; 

7.  Conflict  resolution. 

We  describe  how  to  add  and  delete  rules  dynamically,  and  conflict  resolution  in 
detail  below.  The  other  operations  are  simple  and  their  descriptions  are  omitted 
here. 


4.1  Addition  of  Rules 

We  use  the  command  “insert”  to  perform  the  addition  of  rules  and  facts.  Once 
the  operator  =>  is  found  in  the  argument  of  an  insert  command,  we  apply  the 
Rete  algorithm  to  compile  the  new  rule  into  the  rule  network.  If  the  new  rule  has 
structural  similarity  with  existing  rules,  some  of  the  nodes/paths/memories  in 
the  rule  network  can  be  shared  by  this  new  rule.  A  flag  is  set  for  facts  rematching 
on  those  shared  memory  nodes.  Moreover,  new  nodes  and  paths  are  created  for 
new  patterns  and  a  flag  is  set  for  facts  rematching  on  these  newly  created  nodes 
and  paths.  The  process  of  adding  rules  can  be  described  as  follows: 


81 


Algorithm:  Addition  of  a  Rule  R 
begin 

if  R  can  share  pattern  nodes  existing  in  the  pattern  network 
then  use  these  pattern  nodes  instead  of  creating  new  ones, 
if  some  patterns  P  in  R  are  existing  in  the  pattern  network 
then  use  the  existing  paths  for  P, 

use  the  existing  alpha  memory  nodes  A  of  these  paths  for  P, 
if  paths  J  o(  R  are  existing  m  the  join  network 
then  use  the  paths  J  for  R, 

use  the  existing  beta  memory  nodes  of  J  for  R, 
else  creat  new  join  paths  J‘  in  the  join  netwrok, 
set  a  flag  at  A  for  facts  rematching; 
else  creat  new  pattern  nodes/paths  P*  for  R  in  the  pattern  network, 
creat  new  join  nodes/paths  for  R  in  the  join  network, 
set  a  flag  at  P‘  for  facts  rematching; 
call  the  procedure  “facts  rematching”  on  J'  and  P' , 

end. 


4.2  Facts  Rematching 

Facts  rematching  starts  from  alpha  memory  or  beta  memory  nodes,  whose  flag 
is  set,  if  the  new  rule  shares  memory  nodes  with  some  existing  rules  due  to 
structural  similarity.  All  of  the  facts/partial  matches  in  a  alpha/beta  memory 
node  M  perform  tests  in  node  N,  which  is  a  successor  node  of  M,  with  objects 
coming  from  another  predecessor  node.  If  partial  matches  occur  in  N,  the  tests 
in  node  P,  which  is  a  successor  node  of  N,  are  performed  on  the  partial  matches 
and  the  objects  coming  from  another  predecessor  node  of  P,  and  so  forth.  The 
algorithm  is  also  called  when  new  paths  and  their  associated  alpha  memory  nodes 
have  been  created.  In  this  case,  all  of  the  facts  in  working  memory  are  interpreted 
for  pattern  matching  from  the  top  node  of  the  rule  network  along  the  new  paths. 
Facts  are  stored  in  alpha/beta  memory  nodes  if  pattern  matches/partial  matches 
occur.  Note  that  the  new  rules  which  have  complete  partial  matches,  during  facts 
rematching,  cannot  be  put  into  the  conflict  set.  The  procedure  can  be  described 
as  follows. 


Algorithm:  Facts  Rematching 
begin 

S  *7  {), 

while  a  flag  is  set  at  shared  node  N  with  memory  node  M 
S  -  Su{(N.  A/)}. 

while  a  flag  is  set  at  newly  created  alpha  memory  nodes 

S  •—  Su{(fhc  top  node  of  the  rule  network,the  set  of  /ocJs)}; 
for  ail  elements  {Start,  Bank)  in  5  do 
for  all  elements  £  of  Dank  do 

for  all  successors  51  of  5(ar(  do 

if  51  is  the  last  join  node  of  some  rule 
then  return; 

if  pattern  matches  occur  at  the  successors  51  of  5farf 
then  perform  pattern  matching/partial  matching,  by  E, 
on  all  successors  of  51; 

if  partial  matches  occur  on  the  successors  51  of  5tarl 
then  perform  partial  matching,  by  E,  on  all  successors  of  51; 
end, 
end; 
end; 


end. 


82 


4.3  Deletion  of  Rules 

R.ules  can  be  deleted  by  their  nantes  dynamically.  When  a  rule  is  retracted,  the 
nodes/paths  and  alpha/beta  memory  nodes  associated  with  this  rule  are  removed 
from  the  rule  network.  Nodes  in  the  rule  network  maintain  two  kinds  of  links, 
one  pointing  to  its  successor  nodes  and  the  other  pointing  to  its  predecessor 
node.  Due  to  structural  similarity,  nodes  in  the  rule  network  may  be  shared  by 
lots  of  successor  nodes.  Therefore,  they  may  have  one  or  more  successor  links. 
Only  one  predecessor  node  exists  in  each  node.  When  a  rule  R  is  deleted  from 
the  knowledge  base,  the  activated  version  of  R  in  the  conflict  set  and  the  last 
join  node  L  of  R  are  deleted.  Let  J  be  a  predecessor  of  L.  \f  J  has  L  as  its  only 
successor,  then  J  is  also  deleted  from  the  rule  network.  However,  if  J  has  two  or 
more  succe.ssor  links  then  oidy  the  successor  link  pointing  to  L  is  removed  from 
J.  The  same  process  is  performed  on  all  of  the  predecessor  nodes  of  7,  and  so 
forth.  Here  is  the  algorithm. 

Algorithm:  Dtletton  of  a  Rule  R 
begin 

S  —  th«  last  join  node  of  R, 

F  the  successor  node  of  S  pointing  to  R; 
if  activated  versions  Ra  of  R  exist  in  the  conflict  set 
then  delete  all  Ra's, 
while  there  is  a  node  K  in  S  do 

if  the  number  of  successors  of  K  is  equal  to  I 
then  remove  K  and  its  memory  node; 

S  —  S—  {Ki  u  {jcjr  is  a  predecessor  node  of  K) ; 

F  -  K; 
return; 

else  remove  the  link  pointing  to  F; 
return. 

end; 

end 


5  Conflict  Resolution 

Activated  rules  are  collected  in  the  conflict  set  and  only  one  rule  is  selected  and 
fired  in  each  cycle.  There  are  several  policies  that  are  considered  when  selecting 
one  rule  to  be  fired; 

1.  Rules  with  the  largest  number  of  condition  patterns  are  fired  first 

2.  Rules  with  the  largest  number  of  negated  condition  patterns  are  fired  first 

3.  Rules  with  the  most  recently  matched  LHSs  are  hred  first 

4.  New  rules  are  fired  first 

The  first  policy  has  the  highest  priority,  and  tlie  last  policy  has  the  lowest  prior¬ 
ity.  If  two  or  more  rules  are  left  after  applying  these  policies,  then  one  is  selected 
arbitrarily  for  firing. 


6  The  Interpreter 

An  interpreter  ha.s  been  constructed  to  execute  programs  written  in  the  new 
expert  system  language.  It  basically  performs  recognize-act  cycles  until  no  rules 


83 


can  be  fired,  like  OPSo  [3).  First  of  all,  all  rules  of  an  input  program  are  compiled 
into  a  rule  network.  Then  facts  in  the  working  memory  are  interpreted  from  the 
root  node  of  the  pattern  network  and  matched  facts  are  stored  in  different  alpha 
or  beta  memory  nodes.  Activated  rules  are  pushed  into  a  conflict  set.  Then  the 
following  operations  are  iterated  until  no  rules  can  be  fired. 

1.  Select  one  activated  rule,  according  to  the  conflict  resolution  policies,  from  the 
conflict  set  and  fire  it. 

2.  Execute  the  actions  specified  in  the  right  hand  side  of  the  fired  rule. 

3.  Facts  and  rules  are  added  to  or  deleted  from  the  knowledge  base  and  the  rule 
network,  as  specified  in  the  actions  of  the  fired  rule. 

4.  Due  to  the  addition/deletion  of  facts/rules,  the  content  of  the  memory  nodes  in 
the  rule  network  have  to  be  updated,  new  activated  rules  may  be  added  to  the  the 
conflict  set,  and  some  rules  in  the  conflict  set  may  be  deactivated. 

The  interpreter  was  written  in  C-Prolog  and  includes  the  following  modules: 

1.  the  rule  network  compiler. 

2.  fact  interpreter. 

3.  conflict  resolution. 

4.  action  performing. 

Figure  2  shows  the  block  diagram  of  the  interpreter. 


fact  access 


Fig.  2.  The  block  diagram  of  the  interpreter. 


7  Test  and  Comparison 

Two  examples  are  given  in  this  section.  The  first  example  tests  that  the  in¬ 
terpreter  works  correctly.  The  second  example  shows  the  system’s  capability  of 
changing  rules  dynamically  and  consistently. 

7.1  Solving  the  Monkey-Banana  Problem 

The  Monkey-Banana  problem  is  described  as  follows;  A  monkey  is  standing  on 
a  couch  near  the  coordinate  (5,7).  There  are  bananas  highly  allocated  near  the 
coordinate  (8,2).  A  light  ladder  is  near  the  coordinate  (2,2).  The  goal  is  :  Let 


84 


the  monkey  have  the  bananas.  The  problem  can  be  translated  to  8  facts  and  19 
rules.  The  rules  are  compiled  into  a  rule  network  with  91  nodes.  The  sol  .,  .on  of 
the  problem  consists  of  the  following  steps; 

1.  The  monkey  jumps  olf  of  the  couch. 

2.  The  monkey  walks  from  [5,7]  to  [-,2]. 

3.  The  monkey  grabs  the  ladder. 

4.  The  monkey  walks  from  [2,2]  to  [8,2]. 

5.  The  monkey  drops  the  ladder. 

6.  The  monkey  climbs  onto  the  ladder. 

7.  The  monkey  grabs  the  bananas. 

We  got  the  solution  by  our  system  and  by  CLIPS  respectively  on  a  DEC5000/1‘25 
workstation  with  16MB  memory. 

Our  system.  Fifteen  rules  had  been  fired,  i.e.  15  cycles  had  been  executed, 
before  the  problem  was  solved.  The  total  cpu  time  is  4.41667  seconds  obtained 
by  running  C-Prolog  (version  1.5)  [5]. 

CLIPS.  Twenty-three  rules  were  fired,  i.e.  23  cycles  were  executed.  CLIPS  ap¬ 
plies  only  one  resolution  policy,  requiring  more  cycles  for  this  example.  The  total 
cpu  time  is  0.2  seconds.  Note  that  CLIPS  ran  much  faster  than  system,  partly 
because  the  interpreter  of  CLIPS  is  written  in  C  while  our  interpreter  is  written 
in  C-Prolog  which  is  much  slower. 


7.2  Dynamic  Acldition/Deletioii  of  Rules 

For  the  sake  of  comparison  with  CLIPS,  we  present  this  example  in  CLIPS 

format.  Suppose  we  have  a  rule 

(defrule  rulcOOl  (ford  usa)  (ford  makes  ?x)  (?x  good) 

(printout  t  “usa  ”  ?x  “are  good.”  crlf)  (assert  (?x  cheap))) 
and  facts,  (ford  usa)  (ford  makes  cars)  (ford  makes  truck)  (cars  good) 

Suppose  we  want  to  add  the  following  new  rule  after  the  first  rule  is  fired: 
(defrule  rule002  (ford  usa)  (ford  makes  ?x)  (?x  good)  (?x  cheap)  => 

(printout  t  “usa  ”  ?x  “are  good  and  cheap.”  crlf)) 

Let’s  see  how  CLIPS  and  our  system  work  on  this  problem. 

CLIPS.  CLIPS  does  not  allow  us  to  do  the  insertion  of  the  rule  rule002  by 
putting  it  in  the  right  hand  side  of  the  rule  ruleOOl.  So  we  have  to  insert  rule002 
either  manually  or  by  loading  a  file  containing  rule002.  After  the  rule  rule002 
is  inserted,  obviously  there  is  a  partial  match  from  pattern  1  to  pattern  3  for 
this  rule  since  (cars  cheap)  has  been  added  to  the  knowledge  base  after  the  rule 
ruleOOl  was  fired.  However,  this  partial  match  is  missing.  CLIPS  only  maintains 
the  partial  match  from  pattern  1  to  pattern  2  for  the  rule  rule002.  Therefore, 
rule002  is  neither  activated  nor  fired.  Clearly,  the  rule  network  CLIPS  maintains 
is  not  consistent. 

Our  system.  Our  system  allows  the  problem  to  be  described  as  follows; 
rule(defrule  ruleOOl  (ford  usa)  (ford  makes  ?x)  (?x  good)  => 

(printout  t  “usa  ”  ?x  “are  good.”  crlf)  (gettrame  ?tiame) 

(insert  (defrule  ?name  (ford  usa)  (ford  makes  ?y)  (?y  good)  (?y  cheap) 


85 


(printout  t  “usa  ”  ?y  “  are  good  and  cheap.”  crlf))) 

(insert  (?x  cheap))). 

This  rule  is  activated  and  fired.  Assuming  'i’name  is  bound  to  rule002.  Tlien  the 
rule  rule002  in  the  action  part  of  ruleOOl  is  inserted  and  compiled  into  the  rule 
network.  After  the  compilation,  the  last  action  of  ruleOOl  is  performed  and  (cars 
cheap)  is  inserted  into  tlie  knowledge  base  and  the  rule  network  is  updated. 
We  can  see  that  rule002  has  the  following  3  partial  matches;  {(ford  usa),  (ford 
makes  cars)},  {(ford  usa),  (ford  makes  cars),  (cars  good)},  {(ford  usa),  (ford 
makes  cars),  (cars  good),  (cars  cheap)}. 

Since  (cars  cheap)  is  more  recent  than  the  rule  rule002,  rule002  is  fired. 
Therefore,  no  inconsistency  exists  in  our  rule  network  and  it  seems  that  our 
result  is  more  desirable. 

8  Conclusion 

We  have  proposed  a  scheme  of  a  new  expert  system  language.  The  language  is 
rule-based.  Patterns  in  the  left  hand  side  and  the  actions  in  the  right  hand  side  of 
a  rule  are  predicates.  Facts  are  predicates  and  allow  the  occurrence  of  variables. 
Patterns  are  matched  by  unification.  Execution  is  controlled  by  forward-chaining. 
Rules  are  allowed  to  be  added  or  deleted  dynamically.  A  rich  .set  of  debugging 
commands  are  provided  for  programmers  to  develop  their  programs.  Unification 
instead  of  one-way  matching  is  used  for  pattern  matching. 

An  interpreter,  based  on  the  rule  network,  was  constructed  to  execute  pro¬ 
grams  written  in  this  new  language.  The  interpreter  was  written  in  C-Prolog. 
Rules  are  compiled  into  a  rule  network  for  efficient  pattern  matching.  Paths/ nodes 
are  created  and  joined  into  the  rule  network  when  new  rules  are  added  to  the 
knowledge  base,  and  are  removed  from  the  network  when  rules  are  deleted 
from  the  knowledge  base.  Facts  rematching  is  performed  along  newly  created 
paths  in  order  to  keep  the  knowledge  contained  in  the  rule  network  consis¬ 
tent.  Added/Deleted  facts  may  cause  the  content  of  alpha/beta  memories  to 
be  updated.  In  each  cycle,  one  rule  is  selected  and  fired,  and  its  action  part  is 
performed. 


References 

1.  Artificial  Intelligence  Section,  Lyndon  B.  Johnson  Space  Center.  CLIPS  User's 
Guide,  Reference  Manual,  and  Architecture  Manual,  May  1989. 

2.  C.  Chang  and  R.  Lee.  Symbolic  Logic  and  Mechanical  Theorem  Proving.  Academic 
Press,  New  York,  1973. 

3.  C.  L.  Forgy.  On  the  efficient  Implementation  of  Production  Systems.  PhD  thesis, 
Carnegie-Mcllon  University,  1979. 

4.  C.  L.  Forgy.  Rete:  A  fast  algorithm  for  the  many  pattern/many  object  pattern 
match  problem.  Artificial  Intelligence,  19:17-37,  1982. 

5.  F.  Pereira.  C-Prolog  User’s  Manual.  SRI  International,  Menlo  Park,  California. 


Input-Driven  Control  of  Rule-Based  Expert 

Systems 


Gabriel  Valiente  Feruglio* 


Universitat  de  les  Hies  Balears 
Dept,  de  Cicnc>es  Mateniatiqucs  i  Informatica 
E^-07071  Palma  (Balears)  Spain 


Abstract.  Most  expert  system  control  strategies  focus  on  a  narrow  view 
of  problem  solving,  with  the  resulting  expert  systems  failing  to  accom¬ 
plish  a  good  interaction  with  the  task  environment  during  problem  solv¬ 
ing.  Known  as  the  pregnant  man  problem,  such  a  deficient  external  be¬ 
havior  is  typical  of  many  expert  systems  that  ask  absurd  questions  of  the 
user.  Right  interaction  with  the  task  environment  has  been  already  ad¬ 
dressed  as  a  verification  problem.  A  new  framework  for  realizing  a  good 
external  behavior  is  presented  in  this  paper,  that  consists  of  incorporat¬ 
ing  constraints  on  external  behavior  in  the  control  strategy  of  the  expert 
system.  Called  input-driven  control,  it  cooperates  with  the  standard  con¬ 
trol  strategies  of  the  Milord  II  expert  system  shell  for  realizing  both  a 
correct  problem-solving  behavior  and  an  appropriate  external  behavior. 


1  Introduction 

Expert  systems  attempt  to  reproduce  the  intelligent  problem-solving  behavior 
e.xhibited  by  human  experts  in  ill-defined  problem  domains.  A  sharp  boundary 
can  be  drawn  between  two  aspects  of  such  an  intelligent  behavior,  the  one  be¬ 
ing  solving  problems  right  and  the  other  dealing  with  attaining  an  appropriate 
interaction  with  the  task  environment  during  problem  solving. 

Current  approaches  for  developing  expert  systems  take  a  somewliat  narrow 
notion  of  right  solutions  to  problems.  Consider  for  instance  a  diagnosis  expert 
system.  Given  a  set  of  symptoms,  any  hypothesis  entailed  by  the  knowledge 
ba.se  (enlarged  with  the  symptoms)  is  taken  as  a  right  solution,  no  matter  fi07v 
the  hypothesis  hcis  been  proved  to  be  a  logical  consequence  of  the  (enlarged) 
knowledge  base.  The  counterpart  of  such  a  notion  in  logic  is  that  of  eniailmeiit. 

It  may  happen,  and  is  most  often  the  case,  that  these  symptoms  are  not  given 
beforehand.  The  expert  system  has  to  interact  with  the  task  environment  in  order 
to  get  the  (external)  data.  Even  when  all  the  relevant  symptoms  are  known 
beforehand,  additional  external  data  may  be  needed  during  the  consultation 
process  in  order  for  the  expert  system  to  arrive  at  a  diagnosis. 

Right  interaction  with  the  task  environment  is  so  important  from  the  stand¬ 
point  of  expert  system  usability  and  acceptability  [2]  [d]  that  failing  to  accom¬ 
plish  a  good  interaction  with  the  task  environment  during  problem  solving  has 


*  Partially  supported  by  grant  DGClyT  PB91-0334 


been  recognized  as  the  pregnant  man  problem,  typical  of  many  expert  systems 
that  ask  absurd  questions  of  the  user.  Asking  a  question  about  whetlicr  the 
subject  is  a  pregnant  before  asking  a  question  about  the  sex  of  the  subject  is 
the  classical  example  of  bad  interaction  with  the  task  environment,  because  the 
question  about  whether  the  subject  is  a  pregnant  makes  no  sense  in  the  case  of 
a  male  subject. 

The  previous  discussion  leads  to  a  wider  notion  of  right  solutions  to  problems, 
namely  those  solutions  that,  besides  being  logical  consequences  of  the  knowledge 
base,  have  been  obtained  through  a  right  interaction  with  the  task  environment. 
The  counterpart  of  such  a  notion  in  logic  is  that  of  proof  system. 

The  use  of  the  weaker  notion  of  solution  is,  in  fact,  rooted  in  a  problem  in¬ 
herent  to  the  self  nature  of  the  rule-based  paradigm.  Although  the  rule-based 
paradigm  has  been  offered  as  a  declarative  programming  tool,  expert  system 
developers  have  to  think  procedurally  in  order  to  know  that  the  procedural  read¬ 
ing  of  the  knowledge  base  by  the  inference  engine  will  consider  jireiniscs  in  a 
rule  in  the,  say,  left-to-riftlit  order  in  which  tiiey  have  been  written,  and  write 
therefore  a  rule  like  if  female  and  . . .  and  pregnant  and  . . .  then  ...  for  get  ting 
the  expert  system  ask  a  question  about  the  sex  of  the  subject  before  asking  a 
question  about  whether  the  subject  is  a  pregnant. 

This  issue  becomes  ever  more  complicated  under  more  complex  control  strat(!- 
gies  [8]  [1 1],  such  as  those  involving  the  modular  structure  of  the  knowledge  base, 
the  premise  ordering  within  each  rule,  the  rule  ordering  within  each  module,  rule 
certainties,  rule  specificities,  etc.  Expert  system  developers  usually  end  up  rea¬ 
soning  about  the  order  of  module  calls,  rule  selection,  premise  selection,  and 
the  like.  Right  external  behavior  is  only  achieved,  with  very  much  effort,  by  trial 
and  error:  by  experimentation  with  and  modification  of  a  prototype  of  the  exiiert 
system  until  observing  a  right  external  behavior.  The  main  problem  with  this 
approach  is  that  it  is  not  systematic.  Even  wor.'^e,  it  forces  expert  system  devel¬ 
opers  to  think  procedurally,  reasoning  about  the  order  of  rule  firing  instead  of 
concentrating  themselves  on  achieving  a  good  conceptualization  of  ihcir  problem 
domain. 

Right  interaction  with  the  teisk  environment  has  been  already  addressed  a.s  a 
verification  problem  [6]  [5]  [13]  [14].  A  new  framework  for  realizing  a  good  exter¬ 
nal  behavior  is  presented  in  this  paper,  that  consists  of  incorporating  constraints 
on  external  behavior  in  the  control  strategy  of  the  expert  system. 

The  rest  of  the  paper  is  organized  as  follows.  The  specification  language 
being  used,  together  with  a  computer-supported  methodology  aimed  at  acquiring 
correct  and  complete  specifications  of  external  behavior,  is  pre.sented  in  section 

2.  The  new  control  strategy,  called  input-driven  control,  is  presented  in  .section 

3.  Finally,  some  conclusions  are  drawn  in  section  4. 

2  Problem  solving  and  external  behavior 

The  relationship  between  problem  solving  and  external  behavior  can  be  best 
understood,  by  thinking  of  a  consultation  as  a  way  from  a  state  in  which  the 


88 


knowledge  in  the  expert  system  is  of  a  general  nature  (i.e.  applicable  to  a  wide 
class  of  problems)  to  a  state  in  which  the  knowledge  has  been  specialized  to  a 
particular  problem  — or  to  a  more  restricted  class  of  problems —  by  the  data 
provided  by  the  task  environment.  This  behavior  being  common  to  all  expert 
system  shells,  it  is  most  clear  in  the  Milord  II  shell  [10]  [9],  that  uses  such  a 
specialization  principle  as  the  only  inference  rule. 

Consider  the  following  rules  for  pneumonia  treatment,  adapted  from  [9], 
where  H-Infiuenzat  and  Legionella-sp  are  possible  diagnoses  and  Quinolones 
and  Co-trimoxazolt  are  antibiotics. 

(720)  if  Il-Influenzae  then  Quinolones  is  possible 

(721)  if  female  and  young  and  pregnant  and  Legionella-sp  then  Co- 
trimoxazole  is  slightly-possible 

( 722)  if  female  and  young  and  breast-feeding  and  Quinolones  is  possible 
then  stop-breast-feeding  is  definite 

(723)  if  breast-feeding  and  Co-trimoxazole  then  stop-breast-feeding  i.s 
definite 

Consider  the  case  of  a  young  female  patient  with  a  diagnosis  of  Il-Influenzae. 
When  the  user  is  asked  for  the  value  of  the  diagnosis,  suppose  she  answers  very- 
possible.  With  this  information  from  the  task  environment,  the  expert  system 
specializes  the  knowledge  base  to  the  following. 

(721)  if  female  and  young  and  pregnant  and  Legion^Ila-sp  then  Co- 
trimoxazole  is  slightly-possible 

(722')  if  female  and  young  and  brea.st-feeding  thnn  stop-breast-feeding 
is  definite 

(723)  if  breast-feeding  and  Co-trimoxazole  then  stop-breast-feeding  is 
definite 

Next  the  user  is  asked  for  the  sex  of  the  patient,  and  answers  female.  The 
specialized  knowledge  base  follows. 

(721')  if  young  and  pregnant  and  Legionella-sp  then  Co-trimoxazole  is 
slightly-possible 

(722")  if  young  and  breast-feeding  then  stop-breast-feeding  is  definite 

(723)  if  breast-feeding  and  Co-trimoxazole  then  stop-breast-feeding  is 
definite 

Finally  the  user  is  asked  for  the  age  of  the  patient,  and  answers  young.  The 
resulting  knowledge  base  follows. 

(721")  if  pregnant  and  Legionella-sp  then  Co-trimoxazole  is  slightly- 
possible 

(722"')  if  breast-feeding  then  stop-breast-feeding  is  definite 

(723)  if  breast-feeding  and  Co-trimoxazole  then  stop-breast-feeding  is 
definite 


89 


This  knowledge  base  represents  a  specialization  of  the  more  general  ones  for 
treatment  of  young  female  patients  with  a  diagnosis  of  H-Influenzae. 

Now  two  different  readings  of  this  consultation  can  be  made,  one  centered 
on  the  hypotheses  being  pursued  and  the  deductions  being  made,  and  the  other 
centered  on  the  interaction  with  the  task  environment  — the  questions  that  are 
asked  of  the  user  during  problem  solving  and  the  order  in  which  they  are  made. 
It  is  this  second  reading  what  is  known  as  external  behavior. 

The  ideal  relationship  between  problem  solving  and  external  behavior  is,  of 
course,  that  of  cooperation.  This  means  that  all  questions  asked  of  the  user  during 
the  consultation  are  made  only  when  they  are  relevant  and  necessary  to  the 
current  problem-solving  state,  and  that  the  answers  given  by  the  user  determine 
the  number  and  order  of  questions  that  follow.  This  is  just  the  behavior  that  is 
usually  thought  of  as  expert. 

Cooperation  between  problem  solving  and  external  behavior  can  be  achieved 
by  incorporating  constraints  on  external  behavior  in  the  control  strategy  of  tl>e 
expert  system.  Specification  concepts  for  external  behavior  arc  developed  in  th? 
following. 

Specification  of  external  behavior 

A  general  framework  for  specifying  the  intended  external  behavior  of  an  ex¬ 
pert  system,  consisting  of  both  domain-dependent  constraints  on  external  be¬ 
havior  and  domain-independent  criteria  for  question  selection,  is  proposed  here. 
Domain-dependent  constraints  are  divided  into  order  relations  over  questions 
and  question-relevance  contraints. 

Two  different  order  relations  over  questions  are  defined;  strong  data  depen¬ 
dency  and  weak  data  dependency  [13]  [14].  Note  that  the  words  question  and 
external  fact  are  often  used  here  cis  synonymous.  Strictly  speaking,  question 
refers  to  the  act  of  asking  the  value  of  the  attribute  expressed  by  an  external 
fact  to  the  user. 

A  question  qj  depends  weakly  on  a  question  qt,  noted  q,-  qj,  if  qj  cannot  be 
asked  before  g,-.  A  question  qj  depends  strongly  on  a  question  qi,  noted  qi  <  qj, 
if  “<  Ij  3ind,  furthermore,  qj  cannot  be  asked  unless  g;  has  been  already  asked. 
For  instance,  let  sex  be  a  question  about  the  sex  of  the  subject  and  let  pregnant 
be  a  question  about  whether  the  subject  is  a  pregnant.  Then  sex  <  pregnant 
(and  not  just  sex  -<  pregnant)  because  a  question  about  whether  the  subject  is  a 
pregnant  should  by  no  means  be  asked  unless  the  sex  of  the  subject  is  known  to 
be  female  (and  this  can  only  be  known,  for  the  sake  of  this  example,  by  asking 
a  question  of  the  user  about  the  sex  of  the  subject). 

Question-relevance  constraints  dig  something  deeper  into  question  order  by 
also  considering  answers  (values)  to  questions.  Similar  constraints  are  present  in 
some  commercial  expert  system  shells,  such  as  the  '‘p>'®®‘*PP'^‘»itio*i  '  statement  in 
Tecknow ledge’s  M.l,  although  their  evaluation  may  have  undesirable  side  effects. 

In  the  case  of  the  Milord  II  shell  [10]  [9],  question-relevance  constraints  are 
predicates  of  the  form  q  vnless  •  ondition  where  condition  is  a  pr''mi':c  ..hos'’ 
evaluation  does  not  produce  any  side  effect.  That  is,  such  conditions  are  just 


90 


tested  against  the  current  state  of  the  execution,  without  originating  module 
calls,  rule  firings,  deductions,  nor  further  questions  asked  of  the  user. 

Returning  to  the  previous  example,  a  constraint  like  pregnant  unless  sex  = 
male  would  be  more  appropriate  than  the  strong  data  dependency  sex  <  pregnant 
(Discovering  a  data  dependency  in  a  problem  domain  is,  however,  a  step  in  the 
direction  of  discovering  a  possible  question-relevance  constraint.) 

It  must  be  noted  that  the  distinction  made  between  question  relevance  and 
question  order  is,  to  a  certain  extent,  relative,  because  at  the  very  end  question 
relevance  and  question  order  are  two  sides  of  the  same  coin.  Pushing  the  notion 
of  question  relevance  to  the  limit,  an  expert  system  that  always  asks  the  most 
relevant  question  will  obviously  ask  all  the  relevant  questions  in  the  right  order. 

Domain-independent  criteria  for  question  selection  are  intended  for  covering 
those  cases  in  which  domain-dependent  constraints  on  external  behavior  do  not 
suffice  for  selecting  one  question  to  ask  of  the  user  next.  These  criteria  serve 
therefore  for  further  constraining  the  set  of  possible  questions  to  be  asked  of 
llic  user.  Typical  domain-independent  criteria  include  measures  of  cost,  risk, 
etc.  associated  to  each  question  (external  fact)  in  the  knowledge  base,  and,  of 
course,  the  (lexicographical)  order  in  which  external  facts  have  been  declared  in 
the  knowledge  base  modules. 

When  confronted  with  the  problem  of  defining  constraints  on  question  rel¬ 
evance  and  order  for  a  relatively  large  expert  system,  no  human  expert  would 
arrive  at  a  complete  specification  without  some  systematic  way  of  focusing  atten¬ 
tion  on  small  sets  of  highly  related  questions.  A  methodology  providing  detailed 
step-by-step  instructions  for  arriving  at  a  complete  (and  correct)  definition  of 
the  domain-dependent  component  of  a  specification  of  external  behavior,  has 
been  developed  that  is  based  on  the  method  of  Herod  and  Bahill  [6]  [5]  of  ques¬ 
tion  matrices,  with  some  extensions  for  dealing  with  a  large  number  of  questions 
(coupling  then  with  modular  techniques  underlying  the  base  expert  system  lan¬ 
guage)  and  for  defining  a  more  comprehensive  specification  of  external  behavior. 
The  methodology  has  been  omitted  here  due  to  severe  space  limitations,  and  the 
reader  is  referred  to  [15]  for  a  complete  description. 

The  acquisition  methodology  is  supported  by  a  tool  called  QuestionnAIre, 
that  has  been  implemented  with  a  direct-manipulation  interface  for  filling  in  the 
different  question  matrices.  The  tool  can  be  used  with  knowledge  bases  written 
for  the  Milord  II  shell  [10]  [9]  and  is  being  applied  to  the  development  of  ENS-Al 
[1],  an  expert  system  for  assisting  teachers  to  deal  with  pedagogical  problems. 

3  Input-driven  control  of  rule-based  expert  systems 

Control  strategies  are  usually  described  in  terms  of  the  classical  recognize- act 
cycle  and  the  associated  concept  of  conflict  resolution  [8]  [11].  An  alternative 
formulation,  centered  on  question  selection  — and  therefore  more  appropriate 
for  studying  external  behavior  aspects —  replaces  the  recognize-act  Cjcle  by  ■''n 
eciuivalent  specialize-ask  cycle.  (The  specialize-ask  cycle  does  not  preclude  the 
use  of  conflict  resolution  criteria,  however.) 


91 


The  specialize-ask  cycle  underlies  most  partial  evaluation  architectures,  such 
as  that  of  the  Milord  II  shell  [10]  [9],  Specialization  is  achieved  by  applying  a 
specialization  inference  rule  to  the  knowledge  base  that,  given  a  proposition  A 
with  certainty  value  a  and  a  rule  of  the  form  if  A  and  B  then  C  w’ith  certainty 
value  p  specializes  it  to  a  rule  of  the  form  if  B  then  C  with  certainty  value  p' , 
where  p'  is  computed  by  modus  ponens  from  a  and  p  [9]. 

The  ask  step  is  performed  following  a  question  selection  strategy.  The  stan¬ 
dard  question  selection  strategy  in  Milord  II  reproduces  the  conflict  resolution 
criteria  of  the  Milord  shell  [3]  [7],  namely  rule  certainty,  rule  specificity  and 
(lexicographical)  order  of  rules  [12]. 

Question  selection  is  called  heuristic  if  it  follows  the  specification  of  external 
behavior  defined  for  the  problem  domain  when  deciding  which  question  to  ask 
next.  Heuristic  question  selection  has  been  implemented  in  Milord  II  as  an  input- 
driven  control  strategy  that  cooperates  with  the  standard  question  selection 
strategy  used  in  the  Milord  II  shell. 

Input-driven  control  requires  a  “constructive”  interpretation  of  the  specifica¬ 
tion  of  external  behavior.  This  interpretation  determines,  given  any  interincdiale 
problem-solving  state  and  a  set  of  potential  questions  to  ask  next,  which  qnos- 
tion(s)  satisfy  the  specification  and  can  therefore  be  asked  next.  A  question  q  can 
be  asked  next  in  any  given  problem-solving  state  if  all  the  following  conditions 
are  satisfied: 

-  The  question  q  has  not  been  asked,  i.e.  it  has  not  been  assigned  a  value  in 
the  given  problem-solving  state; 

-  For  all  strong  data  dependencies  of  the  form  p  <  q,  the  question  p  has  already 
been  asked; 

-  For  all  weak  data  dependencies  of  the  form  q  <  r,  the  question  r  has  not 
been  asked  yet;  and 

-  For  all  question  relevance  constraints  of  the  form  q  unless  condition,  condi¬ 
tion  is  not  satisfied  by  the  current  problem-solving  state. 

Notice  that  weak  data  dependencies  of  the  form  p  ^  q  do  not  constrain  the 
expert  system  asking  question  q  next,  but  they  would  later  constrain  the  expert 
system  asking  question  p  if  question  q  is  asked  before. 

The  constructive  interpretation  of  the  specification  of  external  behavior  has 
been  implemented  as  a  cooperating  component  of  the  Milord  II  expert  system 
shell  [10]  [9].  Cooperation  is  achieved  as  follows; 

-  KB  specialization  is  performed  as  usual,  by  iterative  application  of  the  spe¬ 
cialization  inference  rule  until  no  further  specialization  is  possible  given  the 
available  external  data. 

-  Question  selection  after  the  standard  control  strategy  of  Milord  II,  that  ranks 
candidate  rules  according  to  certainties,  specificities,  and  (lexicographical) 
order,  prevails  unless  it  violates  the  specification  of  external  behavior.  In  such 
a  case,  heuristic  question  selection  takes  control  for  selecting  the  first  ques¬ 
tion,  according  to  the  ranking  just  made  by  the  standard  control  strategy, 
that  does  not  violate  the  specification  of  external  behavior. 


92 


-  The  selected  question  is  asked  of  the  user,  and  the  specialize-act  cycle  con¬ 
tinues  until  reaching  a  stop  condition. 

Notice  that  it  may  happen  that  no  question  can  be  selected  in  a  given  problem¬ 
solving  state,  that  has  not  already  been  asked  and  does  not  violate  the  speci¬ 
fication  of  external  behavior.  These  situations  are,  in  fact,  nothing  more  than 
additional  flop  conditions  of  the  expert  system. 

Cooperation  between  problem  solving  and  external  behavior,  as  presented  in 
this  section,  corresponds  then  to  a  modified  ask  step  in  the  specialize-ask  cycle, 
by  which  only  those  questions  satisfying  both  problem-solving  requirements  and 
external  behavior  requirements  are  asked  of  the  user. 


An  example 

The  following  module  excerpt,  taken  from  the  Terap-IA  expert  system  [9],  has 
been  developed  using  the  conventional,  trial-and-error  approach  for  achieving 
an  appropriate  external  behavior.  It  comprises  metarules  (i)  to  avoid  asking  a 
(jiicstion  about  whether  the  patient  is  a  pregnant  or  a  breast-feeding  to  a  man 
or  to  an  elder  woman,  (ii)  to  avoid  asking  a  question  about  whether  the  patient 
is  a  breast-feeding  to  a  pregnant,  and  (iii)  to  avoid  asking  facts  that  depend  on 
pregnant  or  breast-feeding  if  pregnant  or  breast-feeding  do  not  hold. 

Module  anam  = 

Begin 

Import  age,  sex,  pregnant,  pregnant_tenn,  pregnant _period,  breast_f ceding , 
neonate,  premature 

Export  . . . 

Deductive  Knowledge 
Dictionary : 

Predicates : 
age  =  ... 
sex  =  ... 
pregnant  =  . . . 
relation:  needs  sex 
relation:  needs  age 
pregnant_term  =  ... 

relation:  needs  pregnant 
pregnant.period  =  .  .  . 

relation:  needs  pregnant 
breast_feeding  =  ... 
relation:  needs  sex 
relation:  needs  age 
relation:  needs  pregnant 
neonate  =  ... 

relation:  needs  breast_feeding 
premature  =  . . . 

relation:  needs  breast_feeding 


93 


Rules : 

End  deductive 
Control  knowledge 
Deductive  control 

HOOl  if  T (pregnant , sure)  then  conclude  T(not(breast_feeding) .sure) 

M002  if  is (=(sex, (male) ) .sure)  then  conclude  T(not (pregnant) .sure) 

M003  if  is(=(sex. (male)) .sure) 

then  conclude  Knot  (breast_feeding)  .sure) 

M004  if  is(»(sex . (female) ) .sure)  and  T(=(age.$x) .sure)  and  lt($x.l5) 
then  conclude  T(not (pregnant) .sure) 

MOOS  if  is(=(sex.  (female) )  .sure)  «ind  T(=(age.$x)  .sure)  and  lt($x.l5) 
then  conclude  T(not(breast_feeding) .sure) 

M006  if  is(=(sex . (female) ) .sure)  and  T(=(age.$x) .sure)  and  gt($x.45) 
then  conclude  Knot  (pregnant)  .sure) 

M007  if  is(=(sex.  (female) )  .sure)  cind  K=(age.$x)  .sure)  and  gt($x,45) 
then  conclude  Knot  (brea3t_feeding)  .sure) 

MOOS  if  Knot  (pregnant)  .sure)  then  conclude  Knot(pregnant_tenn)  .sure) 
M009  if  T(not  (pregncint)  .sure) 

then  conclude  is (=(pregnant .period. (none)) .sure) 

MOlO  if  Knot(breast_feeding) .sure)  then  conclude  Knot(premature)  .sure) 
MOll  if  Knot(breast_feeding)  .sure)  then  conclude  T(not(neonate)  .sure) 
Evaluation  type;  lazy 
End  control 
End 

The  same  module  excerpt,  rewritten  to  make  use  of  heuristic  question  selection 
as  implemented  in  the  Milord  II  shell,  is  reproduced  below. 

Module  aneuu  = 

Begin 

Import  age,  sex,  pregnant,  pregnant.term,  pregnant.period ,  breast.f eeding , 
neonate ,  premature 
Export  . . . 

Deductive  Knowledge 
Dictionary: 

Predicates: 
age  =  ... 

relation:  strong  pregnant 
relation:  strong  breast.f eeding 
sex  =  ... 

relation:  strong  pregnant 
relation:  strong  breast.f eeding 
pregnant  =  ... 

relation:  strong  breast.f eeding 
relation:  strong  pregnant.term 
relation:  strong  pregnant.period 
pregnant.term  =  ... 
pregnant.period  =  ... 
breast.f eeding  =  . . . 


94 


relation:  strong  neonate 
relation:  strong  premature 
neonate  =  . . . 
premature  =  ... 

Rules: 

End  deductive 
Control  knosledge 
Evaluation  type:  heuristic 
End  control 
End 

Getting  the  right  metarules  for  avoiding  inadequate  question  sequences  has 
proved  to  be  no  easy  task.  Leaving  such  methodological  issues  apart,  the  previ¬ 
ous  implementation  scerpts  illustrate  the  difference  betvveen  the  trial-and-error 
approach  and  the  disciplined  approach  to  external  behavior  presented  in  this  pa¬ 
per.  The  improvement  in  declarativeness  is  evident  from  the  text  of  the  scerpts 
itself. 

4  Conclusion 

Input-driven  control  of  rule-based  expert  systems  offers  both  a  methodologi¬ 
cal  aid  during  expert  system  development  and  an  implementation  technique  for 
achieving  an  appropriate  external  behavior.  The  methodological  improvement 
rests  upon  an  additional  declarative  power  given  to  the  rule-based  paradigm,  and 
allows  expert  system  developers  to  concentrate  on  the  conceptualization  of  the 
l)roblem  domain,  while  leaving  procedural  a.spects  to  the  procedural  reading  of 
the  specification  of  external  behavior  made  by  the  control  strategy.  Input-driven 
control,  as  implemented  in  the  Milord  II  shell,  supports  appropriate  interac¬ 
tion  with  the  task  environment  while  retaining  correctness  of  problem  solving. 
Internal  and  external  behavior  cooperate  in  this  way  for  (re)producing  correct 
.solutions  to  problems  through  a  right  interaction  with  the  task  environment. 


Acknowledgement 

'rite  core  of  the  research  presented  in  this  paper  has  been  carried  out  at  the 
Institut  d’Invcstigacio  en  Intel  ligencia  Artificial  of  the  Centre  d’Estudis  Avanats 
dc  nianes.  I  am  very  grateful  to  the  research  team  at  Blanes,  and  special  thanks 
arc  debt  to  Dr.  Jaume  Agust-Cullell  for  his  support  and  encouragement. 


References 

1.  C.  Barroso.  ENS-AI:  Un  Sistema  Experto  para  la  Enseiiaiiza.  In  Proc.  European 
Conf.  about  Information  Technology  in  Education:  A  Critical  Insight,  Barcelona, 
Nov.  1992. 


95 


2.  D.  C.  Berry  and  D.  E.  Broadbent.  Expert  Systems  and  the  Man-Machine  liit<  tfac c. 
Part  Two:  The  User  Interface.  Expert  Systems,  4(l):18-28,  Feb.  1987. 

3.  L.  Godo,  R.  Lopez  de  Mautaras,  C.  Sierra,  and  A.  Verdaguer.  MILORD:  'I'lic  .Ar¬ 
chitecture  and  the  Management  of  Linguistically  Expressed  Uncertainty,  /tit.  Jour¬ 
nal  of  Intelligent  Systems,  4(4);471-501,  1989. 

4.  J.  A.  llendler,  editor.  Expert  Systems:  The  User  Interface.  Ablex,  Norwood,  .New 
Jersey,  1988. 

5.  J.  M.  Herod  and  A.  T.  Bahill.  Ameliorating  the  “Pregnant  .Man”  Prolih  in,  In 

A.  T.  Bahill,  editor.  Verifying  and  Validating  Personal  Computer- Hast  d  Ejpt  i  l 
Systems,  chapter  3,  pages  26-  45.  Prcptice  Hail,  Englewood  Glilfs,  New  Jersey,  19911, 

6.  J.  M.  Herod  and  A.  T.  Bahill.  Ameliorating  the  Pregnant  Man  Problem:  .A  \'eri- 
fication  Tool  for  Personal  Computer  Based  Expert  Systems,  hit.  Journal  of  .Man- 
Machine  Studies,  35:789-805,  1991. 

7.  R.  Lopez  de  Mantaras,  J.  Agusti,  E.  Plaza,  and  C.  Sierra.  MILORD:  A  Fn/zy 
Expert  Systems  Shell.  In  A.  Kandcl,  editor.  Fuzzy  Expert  Systems,  cliapler  1,5. 
CRC  Press,  Boca  Raton,  Florida,  1991. 

8.  J.  McDermott  and  C.  Forgy.  Production  System  ConRict  Resolution  Strategies,  In 
D.  A.  Waterman  and  F.  Hayes-Roth,  editors.  Pattern  Directed  Inference  Systems, 
pages  177-199.  Academic  Press,  New  York,  1978. 

9.  J.  Puyol,  L.  Godo,  and  C.  Sierra.  A  Specialisation  Calculus  to  Improve  F.xperl 
Systems  Communication.  In  B.  Neumann,  editor,  Proc.  tOth  European  Con/,  on 
Artificial  Intelligence  EC.AI  92,  pages  144-148,  Vienna,  Austria,  -Aug.  1992. 

10.  J.  Puyol,  C.  Sierra,  and  J.  Agusti-Cullell,  Partial  Evaluation  in  Milcnl  II:  .A  Lan¬ 
guage  for  Knowledge  Engineering.  In  Proc.  Europ-IA  91,  pages  I'l.i-  J()7,  P'9 1. 

11.  R.  Sauers.  Controlling  Expert  Systems.  In  L.  Bole  and  J,  Coombs,  editors. 
Expert  System  Applications,  pages  79-197.  Springer- Verlag,  Berlin,  1988. 

12.  C.  Sierra.  MILORD:  Arquilectura  Multi-Nivell  per  a  Sistemes  Experts  cn  Classifi- 
cacid.  PhD  thesis,  Universitat  Politecnica  de  Catalunya,  May  1989. 

13.  G.  Valiente.  Verification  of  External  Adequacy  in  Rulc-Ba,scd  Expert  .Systems 
using  Support  Graphs.  Free  session  contribution  to  the  European  IVorkshop  on 
the  Verification  and  Validation  of  Knowledge-Based  Systems  EVHOVAV  'Jl,  Cam¬ 
bridge,  England,  July  1991.  Published  as  Research  Report  91/15,  Centre  d’Estndis 
Avan^ats  de  Blanes. 

14.  G.  Valiente.  Using  Layered  Support  Graphs  ’’or  Verifying  External  Adequacy  in 
Rule-Based  Expert  Systems.  SIG.ART  Bulletin,  3(l):20-2A,  Jan.  1992. 

15.  C.  Valiente.  Heuristic  Question  Selection.  PhD  thesis,  Univer.silat  .Aiitbnoma  de 

B. srcelona,  1993.  Forthcoming, 


Case-based  planning  for  medical  diagnosis 


Bcatriz  L6pe2*  aiul  Knric  Plaza** 

Iiistitiit  <l’Investigaci<5  eii  IiiteHiRoncia  ArtiHcial  (IMA,  (.'SIC.'), 
Cam!  <le  Santa  Rarhara,  s/n.,  RIanes,  Cirora,  Spain. 
lieaiCicealJ.es  and  plaza^ceah.es 


Abstract.  In  this  paper  we  descrilte  a  ease-based  planner  (ROLBRO)  de- 
velope<l  for  learning  the  proeetlnre  of  a  diagnosis  in  a  me<lical  expert  sysleni. 
A  tiiagnostic  plan  is  bnihl  according  to  the  most  recent  information  known 
alKiut  a  patient.  BOLERO  has  been  teste«l  in  a  real  application  of  pneumo¬ 
nia  diagnosis.  Resttits  sh  iw  that  BOLERO  is  able  to  acrjnire  enough  strategic 
knowledge  to  perform  a  diagnostic  procedure  with  a  high  degree  of  success. 


1  Introduction 

For  some  time  it  lias  been  argued  tliat  keeping  separate  anti  distinct  “cont  rol”  knowl- 
etlge  from  tlomain  knowledge  as  represented  in  an  expert  .system  is  convenient  for 
expert  .system  maintainability  and  for  the  reuse  of  .shells  on  similar  problemsj'il.  In 
earlier  experiments  in  our  Institute,  a  meta-level  architecture  w.as  developed  with 
strategic  (or  control)  knowledge  representctl  at  the  meta-level  and  domain  knowletlge 
being  represented  at  the  object  level  j5|.  We  also  discovered  wliile  designing  a  me<lical 
diagnosis  expert  system  jldl,  that  acquiring  llic  .strategic  knowledge  was  complicate 
and  time-con.snming  for  the  expert  and  the  knowledge  cii'  ineer.  Wc  present  in  tliis 
paper  an  experiment  in  the  automated  acquisition  of  strategic  knowledge  for  expert 
systems  by  means  of  a  case-based  approach. 

In  more  concrete  tenns,  strategic,  knowledge  in  the  exfiert  systems  built  at  Llie 
IIIA  was  to  decide  upon  whicli  diseases  were  piansihle  or  useful  to  try  to  prove  (or 
di.sprove  for  rlaiigerous  illnesses),  and  wliich  questions  to  ask  tliat  were  relevant  to 
those  ends  at  each  moment  relative  to  the  known  facts.  The  questions  should  be 
made  also  in  a  meaningful  order  and  the  disease  hypothesis  .sliould  be  pursued  in 
a  reasonable  order,  lest  the  physiciims  using  the  .system  complain  about  it  doing 
.silly  tilings  and  acting  “erratically”.  In  |r»|  strategic  knowledge  take  the  form  of 
plans  of  action,  i.e.  a  dyiiamically-cliangiiig  sclieme  of  tlic  goals  worth  pursuing. 
Acquiring  this  knowledge  was  slow  and  based  on  a  trial  and  error  process,  until 
the  medical  application  for  diagnosing  pneumonia  performed  according  to  wliat  was 
expected.  Exploiting  the  modularity  of  this  metalevel  architecture  and  the  separate 
representation  of  domain  and  strategic  kiiowlcrlge,  we  investigated  on  the  possibility 
of  automatically  learning  the  diagnostic  procedure  (strategic  knowledge!)  given  (a) 
tile  domain  knowledge,  and  (b)  an  expert  authenticating  the  desired  system  lichavior. 
This  is  the  experiment  reported  here,  where  strategic  knowledge,  represented  by 


*  Supported  by  a  grant  from  the  MEC. 

**  Partially  supported  by  Project  AMP  CYCIT  90/801. 


97 


plans,  are  learned  using  a  case-hased  ai)proacli.  Plans  learned  are  reusr^d  tu  solve  new 
problems,  in  snob  a  way  that  ROLRRO  acts  as  the  nlanner  of  a  rule-based  system 
( RBS)  with  scarce  strategic  knowleflge.  The  structure  of  the  paper  is  as  follows;  in  the 
next  section  we  introduce  the  metalevel  architecture  in  wliich  BOLERO  and  a  RBS 
collaborate  during  problem  solving,  aiul  then,  in  section  3  we  describe  BOLERO,  the 
case-based  plannci .  In  section  4  we  explain  the  results  of  BOLERO  when  applied 
to  learn  and  build  plans  for  pneumonia  diagnosis.  Pinally  we  relate  our  rrsearch  to 
other  works,  and  we  give  some  concbi.sions. 


2  The  BOLERO-RBS  Reactive  System 


The  BOLERO-RBS  system  is  the  result  of  the  integration  of  the  case-based  .system 
BOLERO  ana  a  rule-basc<l  system  (RBS)  within  a  mctalcvcl  architecture.  BOLERO 
plays  the  role  of  the  meta-level  an<l  the  RBS  i>lay.s  the  role  of  the  object-level  (fig¬ 
ure  1).  The  RBS  has  knowledge  aliout  a  specific  domain  in  order  to  .solve  a  problem 
while  BOLERO  has  planning  knowlerlge  particular  to  that  domain.  Tlie  BOLERO- 
RBS  .system  is  a  reflective  system  111];  the  nreta-level  (BOLERO)  is  able  to  modify 
the  state  of  the  object-level  (RBS)  through  plans,  and  it  is  al.so  in  charge  of  startiiig 
and  stopping  the  system  execution. 


Fig.  1.  The  BOLERO-RBS  system. 

For  the  sake  of  simplicity  we  will  not  go  on  details  about  how  both  systems 
interact  (this  has  been  the  aim  of  the  paper  (lOj).  Es.sentially  BOLERO  build.s  plans 
according  to  the  state  of  the  RBS  when  solving  a  problem  and  pl.ans  arc  executed  by 
the  RBS.  One  action  in  BOLERO  means  in  the  RBS  to  v.alidate  a  goal  by  chaining 
rules.  When  the  RBS  is  validating  a  goal  an  action  is  being  executed,  and  wlien  a  goal 
has  been  achieved,  the  action  is  completed.  The  complete  execution  of  an  action  leads 
to  know  .some  evidence  about  patient  illness.  For  example,  the  action  mycoplasma 
is  interpreted  by  the  RBS  .xs  the  goal  pne.umonia-caiised-by-viycoplasma\  when  it  is 
known  that  it  is  quitc-possible  that  the  pneumonia  is  caused  by  mycoplasma  then  the 
action  is  complete  and  some  evirlencc  about  the  patient  illne.ss  is  found.  When  the 
RBS  obtains  new  infonnation  as  a  con.sefjuence  of  the  execution  of  a  plan,  BOLERO 
cim  generate  a  new  plan  adequate  to  the  new  circumstances.  Tints,  the  validation 
of  any  goal  in  tlic  RBS  can  be  interruptcfl,  if  BOLERO  dynamically  generates  a 
new  plan.  In  the  same  way,  when  an  .action  a  is  interrupted  in  a  giveti  moment 
of  the  problem  solving  process,  the  action  can  be  continued  in  a  later  moment  if 
BOLERO  builds  a  plan  that  contains  a.  To  continue  an  action  means  to  go  on  with 
the  validation  of  the  action’s  goal  in  the  RBS.  The  generation  and  execution  of 
plans  is  then  interleaved  in  BOLER  O-R.BS,  in  .such  a  way  that  the  integrated  system 
behaves  as  a  reactive  planner  jd]. 


98 


Reactive,  planning  lias  boon  dcniniistratcrl  as  an  useful  technique  to  deal  with 
uncertainty  and  incompleteness,  a  constant  feature  in  medical  applications!  10|.  The 
information  in  medical  diagnosis  is  incomplete  becaitse  many  situations  cannot  be 
fully  represented  since  there  is  no  way  of  gathering  all  the  information  needed  an<l 
at  the  same  time  the  information  available  is  uncertain  becaitse  lacks  precision  or 
requires  subjective  assessments.  For  example,  let  its  suppose  that  a  25  year  old 
patient  with  a  ri.sk  factor  of  infection  by  HIV  arrives  at  the  hospital  suffering  from 
pneumonia.  The  physician  plans  a  diagnostic  procedure  in  order  to  find  the  agent 
caitsing  the  pneumonia.  This  procedure  or  plan  consists  of  the  following  actions:  take 
an  X-ray  of  the  thorax,  do  an  hcmograpliy,  and  a  gram-sputum  test.  Let  as  assume 
that  the  physician  secs  an  interstitial  pneumonia  in  the  X-ray.  With  this  information 
and  taking  into  account  the  risk  factor  by  infection  by  HIV,  the  physician  considers 
the  possibility  that  the  patient  has  AIDS.  Coasequently  he  decides  to  give  up  the 
rest  of  the  pending  actions  (i.e.  hcmograpliy  and  gram-sputum  test)  and  to  realize 
some  new  action  (like  for  example  a  broncho-alveolar  enema),  to  determine  whether 
Pneumocistii  carinii  is  causing  the  pneumonia.  Given  the  initial  information  available 
about  a  patient  the  physician  then  generates  a  plan  p,  and  when  his  knowledge  about 
the  patient  is  enriched  as  a  consequence  of  starting  the  execution  of  p,  he  produces 
a  new  different  plan  p'.  BOLERO,  as  a  reactive  planner  takes  into  account  the 
incomplete  information  problem  in  such  a  way  that  it  can  generate  plans  dynamically, 
anytime  some  new  information  about  a  problem  is  available  to  the  system. 

3  The  BOLERO  System 

In  this  section  we  explain  the  knowledge  represented  in  BOLERO  (.section  3.1)  and 
the  components  of  BOLERO  as  a  case-based  planner:  the  learning  method  (sec¬ 
tion  3.2),  how  BOLERO  builds  plans  (section  3.3),  and  the  plan  evaluation  method 
(section  3.4. 

3.1  Knowledge  Representation 

Planning  knowledge  in  BOLERO  is  based  in  the  information  stored  in  the  cases 
provided  by  the  teacher.  Cases  determine  the  plan  memory  from  which  BOLERO  is 
able  to  .solve  problems. 

Cases.  A  case  is  the  set  of  episodes  observed  when  a  teacher  .solves  a  problem(figure  2); 
O’  =  (^0,^1, . . . ,  Each  epi.sode  is  a  pair  (sjjPj)  formed  by  a  .situation  and 
a  plan  p*-.  In  the  first  episode,  sj,  represents  the  initial  state  of  a  problem,  and  pg 
a  plan  containing  the  immediate  action  nj,  to  start  problem  .solving.  The  situation 
of  the  next  episode,  s‘, ,  represents  the  state  of  the  problem  achieved  after  that  o), 
has  been  applied,  and  p]  represents  the  concatenation  of  the  plan  pg  and  a  new 
action  o',  with  which  to  continue  problem  .solving  given  the  new  information  in  s] . 
And  so  on  until  the  last  episode,  that  contains  the  final  state  and  the  plan  pj^, 
that  lends  from  Sg  to  sj,..  Then  pj^,  is  the  solution  of  the  problem  from  the  point 
of  view  of  planning.  The  plan  pj,,  is  called  the  final  plan  meanwhile  the  rest  of  the 
plans  of  a  case  are  calletl  partial  plans.  Each  state  or  situation  sj  of  a  case  is  com¬ 
posed  by  a  .set  of  facts.  A  fact  /,  is  a  pair  (uli,v,)  where  id,-  is  the  identifier  of  the 


99 


fact,  and  i>i  tlie  valuo.  The  final  state  sj,.  initst  contain  the  diagnosis  of  the  patient 
illness  or  domain  solution  For  example  the  domain  solution  of  the  case  repre¬ 
sented  in  figure  2,  Sg22v\,  is:  {  (pncntnococcal-pnenmonia,  possible),  (pneumonia-by- 
enterobactcriaceae,  rnodemtely-possible),  (pneumonia-hy-legione.lla,  quile-possible), 
(tiibe.milosis,  very-lillle-possible),  (pnciimonia-hy-pseudovwnas,  uti/ctiown)}.  The 
domain  solution  comes  from  the  result  of  individual  actions  o’  g  pj,.,  that  we 
note  by  restdl(aj).  For  example,  rr.sultfpneumococ)  =  {  (pnc.umococeal-pneiimonia, 
possible)}. 

ao  =  {  } 

PO  =  (gather>inforinatioii] 

^1*  ~  {  (coiiimuitily-acquire<i,  y«i),  (state^of-patient,  mcM^erafe/y-senouj),  (stablishinesit,  suH- 

tien),  (antecedents,  (kind-of-inmutiodefhdency,  advanced- cancer, 

(reatment-tinf/i.anttneopfastic-ayenM),  (X-rays,  pleural- effusion) ,  (diapnea,  no),  (samplew 
of-pleural*lic)uid,  yes) 

Pi  s  (gather-information,  effusion) 

61'.  S2  =  (  (exudate-effusion,  unkown),  (pfeural-stain-mode,  yes),  (gerins-pieural-stain,  yes), 

(empyema,  certain),  (gerrii-idonCiHcatioii,  uniniotvn),  (grAin-slatn,  grampostttve- coccus- tn^ 
poirj),  .  . .  } 

P2  (gather-information,  effusion,  hact-atip) 

^3:  53  ss  S2  U  {  (bocterian-pneumonia,  certam),  (headache,  u»iA;nou;n),  (atipical-pneiimonla,  uu- 
kouni)  } 

P3  (gather-information,  effusion,  itact-atip,  pneiimococ) 

^8*  <8  =  >7  ^  {  (pneumonia- by-|>seiidotnonas,  uttJI;ot<n»)  ) 

PS  ^  (gather-infortnation,  effusion,  hart-aCip,  pnetmiococ,  enterobacteria,  iegioneffa,  tubercufo- 

_ sis,  pseuttoinonas.  O).  _ _ _ _ _ _ _ 


Pig.  2.  5of7ic  partial  plans  and  the  final  plan  of  case  *'g22vl**. 

Plan  Memory.  The  plan  memory  is  organized  according  to  the  episodes  of  cases. 
Since  every  initial  ejjisode  Sq  of  a  case  have  the  same  start  plan  (i.e.  =  (sj,,  p}),then 

it  is  passible  to  develop  an  hierarchy  organization  of  cases.  A  node  N'  is  a  tuple 
{v',p\t')  made  up  by  a  generalhcd  situation  v*,  a  partial  plan  p',  and  the  typical¬ 
ity  t’  of  p'.  The  generalized  situation  v'  is  the  result  of  a  generalization  process  of 
all  t/ie  situations  s  of  the  episofles  Sj  that  contains  the  plati  p'  (see  next  .section). 
Generalized  situations  v’  arc  composed  by  generalizations,  i.e.,  v’  =  {gj}.  Each  gen¬ 
eralization  g'j  is  a  tuple  {id‘j,v’j,w},aj)  where  uPj  is  the  identifier,  u}  a  value  or  set 
of  values,  w}  the  strategic  relevance  of  the  generalization,  and  a}  the  frequency.  The 
strategic  relevance  is  a  number  defincsl  in  the  unit  interval  and  points  the  strength 
between  a  generalization  g}  and  a  jilan  A  generalization  </)  with  a  strategic  rele¬ 
vance  0  means  that  the  generalization  has  been  completely  irrelevant  respect  when 
building  the  plan  p'  of  the  node  Al‘.  A  generalization  with  an  strategic  relevance  1  is 
definitely  relevant.  Ulentifiers  of  facts  are  used  in  BOLERO  to  index  generalizatioas 
with  the  .same  identifier,  and  the  strategic  relevance  of  the  generalization  plays  an 
important  role  when  recovering  cases  from  memory.  The  strategic  relevance  is  in 
some  way  the  substitute  of  the  frequency  on  other  learning  mechanisms  as  COWEB 
1.31  and  LAMDA  |12j.  The  indexing  mechanisms  of  such  system  heavily  rely  on  the 
frequency  of  facts  but,  particularly  in  merJical  domains,  we  detected  that  a  fact  witli 
a  relative  low  frequency  tr  might  be  very  relevant.  Specific  data  tends  to  be  more 
relevant,  although  they  are  not  often  available. 

The  plan  of  a  node  Af'  has  one  action  more  than  the  plan  of  its  direct  ancestor 


100 


yV'~*  and  one  action  less  that  the  plan  of  its  direct  successors  N^,  in  such  a  way 
that  p*  =  p*;a*.  wliere,  obviously,  a*,  is  different  for  each  successor  of  N'.  The 
typicality  t'  is  the  number  of  times  that  j»‘  has  licen  applied  in  the  cases.  A  leaf 
node  Nl  in  addition  to  the  information  t'onccmiiig  to  a  node,  has  a  .set  of  .solutions 
.9*,  Nf  =  (i»*,p*,T*,S‘).  The  set  of  solutions  S'  contains  the  domain  solutions  of  the 
cases  that  have  p'  as  the  final  plan;  5^  =  So- 

3.2  Learning  Plans 

BOLERO  learns  plans  by  organizing  in  its  memory  the  cases,  C^,C',...,C”  of  a 
training  .set.  BOLERO  learn  plans  increinentally:  first  of  all  the  njemory  is  empty 
and  cases  are  incorporated  one  by  one.  The  incorporation  of  a  case  in  memory  is 
based  on  two  main  methods:  a  generalization  method  and  a  termination  conditioits 
learning  method. 

Incorporation  of  a  Case.  A  case  C'  is  arlded  to  the  memory  by  comparing  the 
plan  p'j  of  its  episode  with  any  node  of  the  level  j  of  the  hierarciiy.  If  there  is  a 
node  in  level  j  that  has  a  plan  =  p'j,  then  the  generalized  situation  of 
is  updated  (see  next  paragraph).  The  incorporation  of  the  case  follows  with  the  next 
episode  of  the  case  O'  and  the  nodes  of  the  level  j+I  succcs.sors  of  /V*.  The 
incorporation  metliod  i.s  applied  in  ear))  episode  of  the  ca.se.  If  there  is  no  node  with 
a  partial  plan  like  p'j,  then  a  new  branch  of  the  hierarchy  is  started  from  level  i-l  and 
the  rest  of  the  epi.sodes  of  the  case,  . . . ,  become  nodes  of  the  new  branch.  The 
last  epi.sode  of  a  case  i.s  storcrl  in  a  leaf  norlc  where  termination  conditions  learning 
takes  place. 

Generalization.  When  a  plan  has  been  applied  to  two  different  situations  (i.e. 
there  are  two  different  epi.sodes  made  up  by  the  same  plan  and  different  situations) 
the  generalization  method  provides  a  new  generalized  situation  that  covers  the  two 
original  situations.  Given  a  generalized  situation  v'  of  a  node  and  an  episode  tfj"'  of 
a  case,  the  generalization  procedure  con.sist  in  the  following  steps: 

1.  Perform  a  generalization  of  the  values  of  the  facts  fi  in  s*  and  generalizations 
!J2  in  o'  with  the  same  identifier.  The  generalization  of  two  values  v\  and  Is 
the  union  of  the  two  values  {vi}  U  {02}. 

2.  If  an  abstraction  between  a  f.ict  a  generalization  with  different  identifiers  is 
possible,  do  it.  Abstraction  of  facts  is  based  on  relations  defined  among  the 
identifiers  of  facts  (or  generalizations)  in  the  domain  network  (see  figure  3). 
Given  a  set  of  fact  identifiers  idi,id2,  id,,  ,  attraction  produces  a  new  identifier 
id.  The  new  identifier  id  has  a  relation  with  each  fdj. 

3.  Otherwi.se,  add  the  fact  to  the  generalized  .situation  v'  of  the  node. 
Termination  Conditions  Learning.  One  particularity  that  presents  planning  in 
a  medical  domain  is  that  the  goal  specification  is  unknown  prior  the  elaboration  of 
a  plan.  That  is,  planning  has  been  classically  fonnulated  as  the  process  of  build¬ 
ing  a  sequence  of  operations  that  achieves  a  goal  state  from  an  initial  state.  Both 
the  goal  and  the  initial  state  are  given  as  the  problem  statement.  In  medical  di¬ 
agnosis  the  initial  state  is  given  but  the  goal  state  is  unknown.  To  know  the  goal 
state  is  equivalent  to  know  the  solution  of  the  problem,  i.e.  the  patient  illness.  If 
the  goal  state  is  not  provided,  BOLERO  nuist  have  .some  method  to  recognize  when 


101 


(associated -ctinicat-(l:it»,  skin) 

■X 

is-a 


(skin-<lata-assoc,  cellulitis) 
is-a  I 


belonBS-to^_^_^  ■ 
semiology 

l)elongs-to  /  \belongs-to 
associated-clinical-clata  ilispnea 


I  kind-of 


skin-<lala-assoc  neurological-data-assoc 


^kii^of 

circitlalory-data-assoc 


Fig.  3.  PaiiuU  view  of  the  domain  knowledge  nelwoik  where  different  relations 
among  types  of  facts  ate  defined  (belongs-to,  kind-of,  etc.).  Facts  are  connected 
vrilh  a  is-a  link  to  the  identifier 

an  adequate  solution  for  a  pioldeni  is  reached.  Termination  condition  learning  oc¬ 
curs  in  BOLERO  in  two  diircrent  ways;  1)  by  comparison  with  the  domain  solution 
provided  in  the  cases,  and  2)  by  domain  knowledge.  Solutions  stored  in  leaf 
nodes  Nl  cati  be  reminded  when  solving  new  problems,  and  if  a  similar  solution 
is  found  then  problem  solving  of  the  new  problem  is  terminated.  However  S'  are 
weak  conditions  in  the  sense  that  it  is  very  rare  that  two  cases  have  tlie  .same  do¬ 
main  solution,  even  they  have  the  .same  final  plan.  For  example,  the  cases  ’’rlSvl” 
and  ”m07vi”  have  the  same  final  plan  Igatiier-informalion,  bacl-alip,  pneimtococ, 
legionella,  enterobacteria,  s-pyogenesj,  but  their  domain  sol,  '■"is  are  different  be- 
caase  the  facts  values  gathered  during  the  execution  of  the  action,  lave  been  differ¬ 
ent.  So  while  the  solution  of  case  "rl.'lvl”  is  {  (pncumoccocal-pneumonia,  certain), 
(pneumonia-by-legioncUa,  verij-lilllc-poss),  (pnenmonia-by-cntervbacteriaceae,  very- 
litlle-possible),  (pneurn-by-s-pyogenes,  unknowtt)},  the  solution  of  the  case  ”m07vl” 
is  {  (pnenmoccocnl-pne.umoitia,  possible),  (pnenmonia-by-lcgionella,  very-possible), 
(pneumonia-by-ente.robaclc.riacc.ae,  possible),  (pnenm-by-s-pyogenes,  unknown)} .  For 
this  rea.son  BOLERO  also  uses  a  set  Sc  of  strong  conditions  acquired  from  a  domain 
expert.  That  is,  if  any  condition  in  Sc  is  satisfied,  we  say  that  the  domain  solution 
of  a  problem  is  achieved,  and  so  proljlem  solving  has  to  be  stopped. 

3.3  Plan  generation 

To  solve  a  problem  tt*  means  building  a  plan  that  achieves  a  domain  solution  (in  a 
situation  given  the  initial  situation  Sq,  While  .solving  a  problem,  different  plans 
are  build  according  to  the  dilferent  situations  achieved  as  a  consequence  of  interleav¬ 
ing  generation  and  execution  of  plans.  Current  situation  sj  is  reached  through  the 
application  of  a  plan  p’j_^  in  the  previous  situation  .s’ _ , .  The  execution  of  ,  leads 
to  the  knowletlge  of  a  new  fact  /  that  detennines  the  new  situation  sj  =  U  {/}. 
In  the  new  situation  s’j  BOLERO  builds  a  plan  p’ ,  eventually  different  from  the  plan 
p‘  _,  and  more  adequate  to  the  new  circumstances.  The  method  to  build  a  plan  (as 
any  other  case-based  system)  is  based  in  two  basic  steps:  retrieval  and  adaptation. 
Both  steps  are  performerl  each  time  that  a  new  situation  s'^  is  known,  until  BOLERO 
decides  to  terminate  the  problem  .solving  process. 


102 


Plan  Retrieval.  Let  as  a.ssuiiic  we  are  solving  the  problem  ir*,  and  that  we  know 
a  set  of  facts  or  current  situation  .sj..  The  retrieval  method  consists  on  reminding 
generalized  situations  similar  to  .s‘.  Reminding  occurs  in  a  three  step  procedure:  (1) 
first  index  the  memory  by  the  identifiers  of  the  facts  of  s*  and,  asing  and  a  spreading 
activation  meclianism|7|,  recover  generalizerl  situations  G  =  (2)  compute  the 

similarity  between  facts  /„  in  s*  aiul  generalizations  pj  in  each  xP  asing  the  function 
m/:  and  finally  (3)  compute  the  overall  similarity  between  sj,  and  each  situation  iP 
in  G  asing  the  aggregation  function  m,.  Given  a  generalization  in  memory  and  a 
fact  fa  in  with  the  same  identifier  (otherwise  rnj{gk,fa)  —  0). 

{0  if  tnonovaJued(fa)  A  /  vi 

G„(W/t)  if  mutlivaiuedlfa)  A  v„  =  v* 

iat(l-d(t>,.t;fc))  iffxizzy(fa) 

Wk  otherwise 


where  C„  is  a  disjunctive  aggregation  function  of  m  evidences  based  on  a  C-norm(12, 
1].  The  function  tn,  combines  the  evaluation  m/  for  each  generalization  in  tP,  in 
the  following  way: 


m,{xP,s\) 


— - +  — ‘“-Tni - 


where  n  is  the  number  of  generalizations  in  tP  that  have  frequency  1.0;  A  is  the 
number  of  facts  in  tP  that  are  also  in  .sj,  and  the  frequency  of  which  is  less  than 
1;  and  |sj.l  is  the  cardinality  of  sj,.  The  m,  function  computes  a  mean  among  the 
inclusion  (first  term  of  the  numerator)  and  the  exclusion  (second  term)  of  the  facts 
of  the  current  situation  in  a  generalize<l  situation. 

Plan  Adaptation.  The  adaptation  of  the  plan  of  the  node  retrieved  from 
memory  consists  on  avoiding  the  repetition  of  actions  already  executed  in  the  current 
situation  .sj,.  That  is,  if  an  action  nj  has  Imwii  already  performetl  in  a  given  moment 
X,  «.ssuming  that  situations  are  monotonic,  for  any  later  situation  Sy  (s^  -<  s|,), 
result{aj)  C  Sy.  So,  there  is  no  sense  to  include  again  the  action  aj  in  a  plan 
to  continue  problem  solving.  Then,  the  outcome  of  the  adaptation  of  the  plan  p*'' 
retrieved  from  memory,  is  the  new  plan  p^  where  any  action  a/  of  pj,  belongs  to  p*' 
but  resuU{ai)  s*.  For  example,  let  its  .suppose  that  in  a  given  situation  sj  the 
plan  p  =  Igaiher-infomiation,  hac.t-nlip,  pneitmocoe,  enterobact,  legioiieUa/  has  been 
applied  and  the  result  of  all  its  actions  arc  known.  Let  as  also  suppose  that  with  in 
the  situation  Sj  a  new  node  N'‘  is  retrievetl,  whose  plan  is  p*’’  =  (gaVicr-information, 
bact-atip,  estafUococ,  pneurnococ,  etUcrobael,  hibemUosisl.  Then  the  plan  generated 
is  p*  =  festafilococ,  enterobact,  tnberenlosis/.  Actions  gather-information,  bact-atip, 
and  pneurnococ  of  p**  are  not  included  in  since  they  have  been  already  completed. 
Termination.  There  exists  four  pos.sibilities  to  end  the  problem  solving:  (1)  when 
there  is  no  active  node  retrieved  from  memory;  (2)  when  some  termination  condition 
described  in  Sc  is  satisfied  in  the  current  situation  .s* ;  (3)  when  the  current  domain 
solution  and  a  solution  in  a  leaf  node  (i.e.  a  solution  of  a  previous  case)  are  similar; 
and  (4)  when  there  is  a  final  plan  of  a  nmk  Afj  already  executed  in  the  current 
situation  and  the  computed  overall  similarity  for  the  current  retrieverl  node 
is  mg(tP,Sa)  =  oi<^P  =  mc(v‘,Sa)  I9|. 


103 


3.4  Plan  Evaluation 

Plan  evaluation  in  medical  diagnosis  provides  iiirorniation  about  the  achievement  of 
the  diagnostic  of  the  patient,  (the  cortvclneas  of  the  plan),  and  about  tlie  quality  of 
the  procedure  follower!  to  .achieve  the  diagnostic  (the  occtirari/of  plans).  Evaluation 
requires  .some  Iciiul  of  knowledge  that  we  call  gold  standard,  i.e.  the  right  solution, 
the  wished  .solution.  The  gold  standard  is  a  iiunlel  of  functionality,  that  for  plan 
evaluation  purposes  is  the  optimal  plan.  To  determine  a  gold  standard  it  is  not  an 
ea.sy  ta.sk  as  has  been  demonstrated  elsewhere  (Sj.  For  this  reason  we  have  defined 
in  BOLERO  an  approach  to  the  golrl  standard  that  we  call  the  evaluation  standard 
(ES).  The  ES  is  defuicd  for  each  training  case  C  and  consists  on  an  admissible  plan 
PApiC)  and  a  set  of  domain  solutions  PAd{C).  On  one  hand,  lAp  is  provided  by  a 
oracle.  On  the  other  hand,  PAa  is  an  approach  to  the  ideal  domain  solution  that  we 
build  upon  the  consensus  of  several  experts.  BOLERO  uses  the  ES  to  determine  the 
correctness  and  the  accuracy  of  plans. 

Correctness  of  Plans.  After  solving  a  problem,  BOLERO  obtains  a  new  case  C”. 
The  plan  succeeds  (and  so  the  plan  is  correct)  if  all  the  actions  required  by  PAp 
are  in  pj,*.  That  condition  of  success  of  plans  guarantees  that  at  least  BOLERO  has 
found  tlie  .same  diagnosis  than  the  teacher,  since  BOLERO  has  performerl  the  same 
actions  with  the  .same  data.  It  is  possible  that  some  actions  performed  by  the  teacher 
do  not  lead  to  know  the  fliagnosis  of  the  patient.  However  such  .actions  should  not 
be  forgotten  since  in  medical  diagnosis  it  is  preferable  to  perform  additional  actions 
than  to  forget  or  discard  actions  that  may  lead  to  diagnase  some  dangerous  disease. 
Thus  we  say  that  BOLERO  has  a  conservative  bias.  The  degree  of  .success  of  a  plan 
relates  the  number  of  actions  of  the  RAp  included  in  p\^i  and  the  total  number 
of  actions  of  FAp.  So,  a  plan  is  correct  if  the  degree  of  .success  is  100%. 

Accuracy  of  Plans.  Besides  the  degree  of  success  of  plans  we  li.ave  defined  the 
degree  of  focus.  The  degree  of  focus  is  easily  defined  as  the  contrary  of  the  degree  of 
out- focus:  degree  of  focus  =  100  -  degree  of  out-focus.  The  degree  of  out- focus  is  the 
percentage  of  actions  perfor.ner'  that  could  be  .spared.  Among  all  actions  of  a  final 
plan  that  are  not  in  FAp,  wt-  co  -sider  a  unnecessary  action  the  action  the  results  of 
which  are  not  justified  Ijy  the  domain  experts  (i.e.  those  not  in  RAa)- 

To  illustrate  with  an  exainjilc  plan  evaluntion,  let  us  suppose  that  BOLERO 
has  solve  the  case  ”g24vl”  with  the  plan  Pg2dv\  =  Igather-information,  bact-atip, 
virus,  cldantydia,  q-fexicrj,  and  that  the  PAp  is  Igather-information,  bact-atip,  virus, 
mycoplasma,  chlamydial.  The  action  mycoplasma  is  not  included  in  Pj24«i,  and  then 
the  plan  build  has  a  degree  of  success  of  80%.  The  degree  of  focus  is  100%  because, 
although  the  action  q-fever  is  not  in  RAp{”g2Avl”),  it  Is  contained  in  RAdC 9^Avl”). 

4  An  Application  to  Medical  Diagnosis 

We  have  applied  BOLERO  to  build  plan.-,  t...  -ii  '^,  lose  the  agent  causing  pneumonia. 
Pneumoniae  are  frequent  illnawes  that  need  urgent  treatrnent|14|.  An  early  treat¬ 
ment  means  that  the  physician  needs  to  diagno.se  the  agent  that  causes  the  infection 
by  taking  advantage  of  all  the  information  he  has  about  the  patient,  and  before  know¬ 
ing  the  results  of  some  tests.  Although  complementary  proves  can  be  perfonned,  the 


physician  does  not  know  generally  the  etiology  of  the  illness.  Incomplete  and  uncer¬ 
tain  infonnation  are  characteristics  of  pneumonia  diagnosis  that  persist  along  all  the 
therapeutical  process.  Decisions  should  be  taken  and  the  experience  of  the  physician 
on  previous  cases  plays  an  important  role  in  the  diagnosis  of  the  patient. 

Our  start  point  has  been  a  .set  of  79  cases  provided  by  a  teacher  (a  physician 
expert  in  diagnosing  pneumoniae)  and  correspond  to  real  cases  taken  from  4  hospi¬ 
tals.  We  have  a  set  of  460  facts  representing  the  pneumonia  domain.  Eiach  case  has 
in  average  88,23%  of  facts,  and  22,87%  of  fact’s  values  are  unknown.  To  execute  the 
plans  build  by  BOLERO,  we  have  a  rule-based  .system  available,  PNEUMON-IA  [H] 
that  Is  able  to  interpret  and  execute  the  plans. 

We  have  evaluated  experimentally  BOLERO  by  using  the  leave-one-out  method. 
An  experiment  consist  in  training  the  .system  with  78  cases  and  testing  the  .system 
with  a  test  case  C  (selected  randomly  among  the  79  cases).  We  have  perfonnerl  10 
experiments,  with  10  different  test  CiUie.s,  and  changing  the  order  of  the  cases  in  the 
training  set.  To  evaluate  BOLERO  we  have  defined  two  measures:  identiHcation  and 
predictivene.ss.  Identification  consist  on  evaluating  the  degree  of  confidence  that  the 
.system  shows  in  solving  a  case  already  in  memory  (i.e.  a  case  of  the  training  set). 
Predictiveness  determines  the  degree  in  which  BOLERO  is  able  to  solve  a  case  not 
.seen  before  (i.e.  a  test  case).  The  measurements  have  been  taken  when  BOLERO 
has  n  cases  in  memory,  then  n+k,  then  n+Sk,  and  so  on  until  all  cases  have  been 
learned. 

Results  on  identification  show  that  BOLERO  learns  to  .solve  the  cases  exactly 
(100%)  as  the  teacher  does.  The  number  of  cases  In  memory  does  not  affect  the 
identification  of  a  case  in  memory.  Results  on  predictiveness  are  given  in  figure  4. 
The  behavior  of  the  system  is  really  good.  Just  with  few  cases  in  memory  BOLERO 
is  able  to  solve  problem  around  the  98%  of  degree  of  success  and  the  80%  of  degree 
of  focus.  A  full  experimental  evaluation  of  the  system  is  provided  in  [9]. 


0  10  20  30  40  SO  60  70  8  18  28  38  48  58  68  78 

number  of  training  cose)  number  of  training  case; 


Fig.  4.  Predictive  lestiUs  on  BOLERO. 


5  Conclusions 

In  the  experiment  showed  in  this  paper,  case-based  planning  has  been  proved  as  an 
useful  technique  to  acquire  strategic  knowledge.  Degree  of  .success  is  achieved  after 
the  training  of  a  few  cases,  and  accuracy  increases  with  the  number  of  training  cases. 
The  main  conclusion  is  that  strategic  knowlcrlgc  can  be  acquired  in  few  minutes 


105 


automatically  from  the  medical  cases  and  the  expert  advice,  while  hand-coding  the 
strategic  knowledge  the  expert  or  a  knowledge  engineer  spends  many  days,  two  weeks 
fulltime  in  the  pneumonia  example  shown  in  the  paper  [H].  Moreover,  the  medical 
cases  needed  in  order  to  learn  phuis  were  also  needed  in  the  hand-coding  process  to 
validate  the  strategic  knowledge,  so  no  new  effort  was  required. 

Two  main  case-based  planners  are  related  to  our  re.search:  CHEF  |6|  and  SMART 
|13|.  BOLERO  has  an  innovative  approach  to  case-based  planning  fostered  by  the 
fact  that  BOLERO  has  been  designed  to  be  useful  for  medical  diagnosis.  The  eval¬ 
uation  method,  the  termination  condition  learning  method,  and  the  capability  of 
performing  a  reactive  planning  are  characteristics  of  the  systems  that  neither  CHEF 
nor  SMART  have. 

References 

1.  P.P.  Bonissone  and  K..S.  Decker.  .Selecting  Uncertainty  C^alculi  and  Granularity:  An 
Experiment  in  TYaxling-ofT  Precision  and  Complexity.  In  L.N.  Kanal  and  J.F.  Lenimer, 
ed..  Uncertainly  in  A  I,  p.  217-247.  Elseview  Science  Pub.,  North-Holland,  1986. 

2.  W.J.  Clancey.  The  Advantages  of  Alistract  CJontrol  Knowledge  in  Expert  .Systems.  In 
Pmc.  AAAI,  pages  74-78,  Washington,  D.O.,  198.'}. 

3.  J.H.  Genenari,  P.  Langley,  and  D.  Fisher.  Morlels  of  Incremental  Concept  Forma¬ 
tion.  In  J.  CarlKjnell,  editor.  Machine  Learning.  Pamdigms  and  Methods,  pages  11-61. 
MIT/Elsevier,  1990. 

4.  M.P.  Georgeff  anrl  A.L.  Lansky.  Reactive  Reasoning  anrl  Planning.  In  J.  Allen  and 
A.  Tate,  editors.  Readings  in  Planning,  p.  729-734.  Morgan  Kaufmann  Pub.  Inc.,  1990. 

5.  L.  Godo,  R.  Ldpez  de  Miintaras,  C.  Sierra,  and  A.  Verdaguer.  MILORD:  The  Architec¬ 
ture  and  Manegement  of  Linguistically  Expressed  Uncertainty.  International  Journal 
of  Intelligent  Systems,  4(4);471-501,  1989. 

6.  K.J.  Hammond.  Case-Based  Planning.  Viewing  Planning  as  a  Memory  Task,  volume  1 
of  Perspectives  in  Aiiificial  Intelligence.  Academic  Press,  Inc.,  1989. 

7.  J.A.  Hendler.  The  Design  and  Implementation  of  Marker-passing  Systems.  Connection 
Science,  1(1),  1989. 

8.  B.  Ldpez.  CIONKRET;  A  C.'ontrol  Knowledge  Refinement  Tool.  In  M.  Ayel  and  J.P. 
Laurent,  editors,  Valitlation,  Verificatioti  and  Test  of  Knowledge-Based  Systems,  chap¬ 
ter  13,  pages  191-203.  John  Wiley  &  Sons,  1991. 

9.  B.  Ldpez.  Aprenenlatgc  de  plans  per  a  sistemes  experts.  PhD  thesis,  Universitat 
Polit^nica  de  Catalunya,  Facultatd  d’lnform^tica  de  Barcelona,  1993.  In  preparation. 

10.  B.  L6pez.  Reactive  Planning  Through  the  Integration  of  a  Case-Based  .System  and  a 
Rule-base<l  .System.  In  Pine.  AISB,  llniversity  of  Birmingham,  UK,  1993.  To  appear. 

11.  P.  Maes.  Issues  in  Ckimputational  Reflection.  In  P.  Maes  and  D.  Nardi,  editors.  Meta¬ 
level  Architectures  and  Reflection,  pages  21-35.  Elsevier  Science  Pub.  B.  V.,  1988. 

12.  N.  Piera,  Ph.  Desroches,  and  J.  Agiiilar-Martin.  LAMDA:  An  Incremental  Conceptual 
Clustering  Method.  Technical  Report  89420,  Lalxiratoire  d’Automatique  et  d’Analyse 
des  Systfemes  (LAAS),  Toulouse,  FYance,  1989. 

13.  M.M.  Veloso.  Learning  by  Analogical  Reasoning  in  General  Problem  Solving.  PhD 
thesis.  School  of  Computer  Science,  Carnegie  Mellon  University,  August  1992. 

14.  A.  Verdaguer.  PNEUMON-tA:  Desenvolupament  i  Validacid  d’un  Sistema  Expert 
d’Ajuda  al  Diagnostic  Medic.  PhD  thesis,  Universitat  Aiitbnoma  de  Barcelona, 1989. 


MethoDex  : 

A  Methodology  for  Expert  Systems  Development 


J  P.Klut  and  J.H.P.  ElolT 
Department  of  Computer  Science 
Rand  Afrikaans  University 
PO  Box  524 
Johannesburg 
2000 

South  Africa 


Abstract: 

A  framework  that  can  help  users,  experts  and  data  processing  personnel  to  be  effectively 
involved  in  the  development  of  Expert  Systems  is  proposed  and  discussed.  The  approach 
taken  is  to  evaluate  the  two  most  basic  approaches  used  in  industry  today,  namely,  the  u.sage 
of  a  standard  methodology  (SDLC)  which  is  also  used  for  the  development  of  general 
business  .systems,  and  the  well-known  Knowledge  Engineering  Cycle  approach.  The 
proposed  methodology  for  Expert  Systems  development  (MethoDex)  is  designed  to  con.sist  of 
a  three-layered  framework:  activities,  procedures  and  resources.  MethoDex  has  been 
developed  within  a  life  a.ssurance  industry  environment  (Liberty  Life  A.ssociation  of  Africa 
Limited)  and  has  already  proved  itself  as  a  valuable  framework  for  the  succe.s.sfitl 
development  of  Expert  Sy.stems  within  a  commercial  environment. 


1  Introduction 

It  is  a  fact  that  today  Expert  Systems  (ESs)  is  a  fast  growing  and  developing 
information  technology.  The  programming  of  knowledge  to  create  commercial 
applications  has  been  made  feasible  with  the  introduction  of  many  types  of 
knowledge  representation  tools,  of  which  the  "Shell”  is  probably  the  best  known.  A 
major  reason  for  the  acceptance  of  this  technology  can  be  attributed  to  the  relative 
ea.se  of  use  that  the  early  commercial  "Shells"  exhibited  and  their  non-procedural 
way  of  representing  knowledge,  mostly  in  terms  of  rules.  Examples  are  VpExpert, 
Levels,  Exsys  and  Personal  Consultant  Plus. 

Early  identified  knowledge  representation  schemes  (such  as  rules)  proved  to  be  not 
functional  for  all  types  of  knowledge  problems  f  This  fact,  combined  with  the 
realization  that  the  building  of  ESs  is  moving  out  of  the  AI  Labs  into  the  normal 
corporate  systems  development  areas,  ^  have  led  to  systems  development  and 
management  problems 

There  are  many  references  to  the  development  and  management  of  ESs,  but  they 
are  either  specific  to  a  particular  Expert  System  (ES)  application  ^  or  simply 
address  the  Knowledge  Engineering  design  and  development  life  cycle  In  most 


107 


cases  these  references  include  the  advocating  of  prototyping  ^  and  in  many  others  a 
strong  procedural  approach  ^  is  adopted.  However,  very  little  reference  to  a  general 
systems  development  life  cycle  of  an  "Expert  System",  or  knowledge-based 
application,  exists.  This  fact  highlights  the  current  namre  of  the  project  management 
of  ESs  and  adds  to  the  confusion  and  difficulty  in  the  practical  implementation 
thereof.  Without  a  clear  view  and  guidelines  of  how  to  manage  and  develop  ESs, 
this  predicament  will  continue,  much  to  the  detriment  of  the  acceptance  and  growth 
of  this  technology. 

The  building  of  an  ES  has  many  facets,  of  which  the  engineering  of  knowledge  is 
one.  In  principle  the  KE  Cycle  model  ^  is  used  as  a  guideline  to  the  Knowledge 
Engineering  process  ( the  analysis,  design,  development  and  testing  of  knowledge  or 
expertise).  Knowledge  Engineering  can  be  done  in  many  ways  and  is  in  many 
instances  dependent  on  the  complexity  of  the  expertise  to  be  represented.  To 
represent  expertise,  the  KE  Cycle  can  be  used  successfully.  In  this  instance  the 
Knowledge  Engineering  process  is  based  specifically  on  a  prototyping  process 
model  as  opposed  to  a  ftinction  or  data  process  model 
However,  to  base  the  overall  methodology  for  development  of  an  ES,  from  ES 
inception  to  ES  maintenance,  on  the  KE  cycle  or  KE  cycle  in  combination  with  the 
above-mentioned  process  models,  is  incomplete.  It  does  not  provide  a 
comprehensive  solution  to  the  problems  and  complexities  involved  in  the 
development  of  ESs.  The  first  main  reason  for  this  is  that  certain  specific  activities 
that  mark  the  formal  initiation  and  finalization  of  a  project  are  disregarded.  The 
second  main  reason  is  that  the  KE  cycle  model  assumes  abstraction  as  the  project 
development  progresses.  It  is  applied  in  the  same  way  on  the  macro  and  micro  level 
for  activities  and  tasks.  For  example.  Analysis  (Identification  and  Conceptualization 
in  the  KE  cycle)  can  imply  analysis  of: 

a)  a  single  experti.se  entity  (a  fact  or  object), 

b)  knowledge  (eg.  a  rule)  or  even  knowledge  base  structuring,  or 

c)  the  total  project  from  a  management  and  Knowledge  Engineering  point  of  view. 
Procedures  in  this  approach  thus  do  not  refer  to  a  series  of  specific  steps  and  can 
over  emphasize  the  abstract  concepts  inherent  in  the  KE  cycle. 

The  use  of  a  conventional  Systems  Development  Life  Cycle  (SDLC)  approach  for 
the  development  of  ESs,  also  has  shortcomings.  The  main  shortcoming  is  that  in  a 
conventional  SDLC  the  underlying  basis  is  the  programming  of  procedures;  in  an 
ES  the  underlying  basis  is  the  programming  of  knowledge. 

This  situation  account  for  many  ES  project  management  problems.  Management  has 
to  manage  an  ES  project  using  a  combination  of  the  above  approaches.  This  is  the 
basis  for  the  methodology  MethoDex  as  proposed  later  in  this  paper. 

2  DESIRED  COMPONENTS  IN  THE  EXPERT  SYSTEMS 
DEVELOPMENT  LIFE  CYCLE. 

Based  on  the  Knowledge  Engineering  model  and  using  a  bottom  up  approach,  the 
authors  identified  the  following  primitives  needed  in  the  ESs  development  life  cycle: 
1.  The  KE  cycle,  which  is  based  on  a  prototyping  process  model,  used  to  produce 


108 


a  prototype  . 

2.  Two  development  deliverables;  the  demonstration  and  production  ES 
prototypes. 

The  demonstration  prototype  serves  as  an  analysis,  demonstration  and  risk 
assessment  vehicle. 

The  operational  prototype  is  the  production  version  of  the  demonstration 
prototype.  It  is  based  on  the  same  process  model  but  with  a  different  emphasis 
on  certain  procedures. 

3.  Procedural  project  management  aspects,  for  example:  initial  actioas  taken  to 
identify  the  ES  application  and  to  initiate  the  project  as  well  as  actions  taken  to 
ensure  the  implementation  and  continuous  tuning  and  updating  of  the  ES 
application. 

To  provide  a  framework  to  logically  incorporate  these  primitives  needed  in  an  ESs 
methodology,  the  authors  use  some  principles  of  the  waterfall  model  ^  as  well  as 
the  "prototype  and  risk  assessment  concept"  used  in  the  Spiral  model  as  proposed  by 
Boehm  These  principles  are  used  to  address  the  development  life  cycle 
components  from  initiation  to  completion. 

3  MethoDex  ;  A  PROPOSED  METHODOLOGY  FOR  THE 
DEVELOPMENT  OF  EXPERT  SYSTEMS. 

The  design  and  assembly  of  the  above  desired  components  are  built  into  a  proposed 
methodology,  MethoDex,  an  acronym  for  "A  Methodology  for  Expen  Systems".  A 
high  level  view  of  MethoDex  is  shown  in  figure  1 . 

A  conical  layered  approach  is  used  to  describe  three  important  facets  of  MethoDex: 

•  The  central  triangular  view  of  the  model  shows  MethoDex  activities  and  their 
dependencies  on  one  another. 

•  The  right-hand  triangular  view  shows  the  activities  in  relation  to  the 
conventional  systems  development  life  cycle  phases.  This  illustration  indicated 
by  the  checkered  circle  on  the  diagram,  shows  that  in  the  MethoDex  case,  some 
of  these  phases  overlap  and  will  be  executed  in  parallel. 

•  The  left  triangular  view  of  the  model  shows  some  project  management 
components  that  need  to  be  taken  into  account  during  the  creation  of  the  ES.  In 
particular,  second-level  entry  points  to  MethoDex  refer  to  those  stages  in  the 
methodology  where  sub-knowledge  bases  or  totally  new  ESs  need  to  be 
developed  as  a  result  of  analysis  and  design  criteria.  When  these  entry  points  are 
used,  MethoDex  is  executed  again  to  address  the  sub-level  of  the  main  ES 
development  effort.  MethoDex  is  thus  used  in  the  same  fashion  (possibly  with  a 
different  emphasis  on  some  MethoDex  components)  for  sub-level  knowledge 
definition  and  creation. 

A  more  detailed  view  of  MethoDex  is  shown  in  figures  2a  and  2b. 


109 


4  THE  MECHANICS  OF  MethoDex. 

4. 1  Activities 

Activities  in  MethoDex  are  logically  based  on  the  primitives  that  are  needed  in  a 
methodology  tor  ESs  development. 

Activities  serve  as  milestones  or  checkpoints  and  carry  important  information  that  is 
needed  in  the  subsequent  activity. 

Some  activities  produce  specific  deliverables  that  serve  as  reference  to  the 
management  of  the  ES  development  (fig.  1,  activity  A4  and  A7).  Figures  2a  and  2b 
show  how  activities  are  linked  horizontally  to  one  another.  Linked  from  left  to  right 
and  executed  only  once,  this  conventional  SDLC  approach  ensure  that  the 
development  of  the  ES  does  not  end  up  in  the  ES  prototyping  trap  of  never  ending 
prototyping  *  *.  In  the  authors'  opinion  this  is  one  of  the  very  real  dangers  that 
exist  in  early  ES  development.  The  "one-off  execution"  feature  ensures  that  the 
system  scope  is  more  easily  adhered  to  and  clearly  defined.  It  also  simplifies  the 
estimation  of  project  duration.  The  two  most  important  activities  in  MethoDex  are 
the  Demonstration  (A4)  and  Operational  (A7)  prototype  construction. 


'.tk:  '■ 


t  c  M  • 


Ir,p.-  -  tont 
Soc  ta  t 
De'-I  “’r"  oko'  e^' 


Dev  -? 


Pr''  t  Mali 
Core,  "inent  s 


- 


l  )yD^9 


i 


Joy.r'q 


F-re'-l  Aflttly 


,De-  A. 


Syste'-s 
Deveiopf*'*?n  t 
Lif e  Cyc'e 


A  iron/’IO'  Ik-'Ve'  view 

-  (  Me  tk'ioDe  * 


4.2  Procedures 

Procedures  are  a  set  of  guidelines  and  steps  that  need  to  be  performed  in  a  certain 
way  to  create  an  environment  in  which  all  activities  can  be  successfully  completed 
(see  figure  2a  and  2b).  In  the  creation  of  this  environment,  information  in  terms  of 
resources  and  deliverables  is  used  and  created. 


Iritidl  ^equireflents  Phase - ^  t -  5f»iof5‘ri,tw  Prototype  Piiose 


T 


no 


Opp'^ationai  Prototype  Phooe 


11 


I 


112 


A  set  of  procedures  relates  to  a  specific  activity.  Input  or  infomiation  is  supplied 
from  the  procedures  of  previously  completed  activities.  If  an  activity  cannot  be 
completed  because  of  a  .shortage  of  information  input  from  preceding  activities, 
neighboring  procedures  can  be  revisited  to  obtain  the  required  input.  It  is  important 
to  note  that  the  development  process  never  moves  back  to  previous  activities.  Wlien 
revisited  procedures  produce  a  change  in  information  in  their  activities,  this 
information  change  may  ha\’e  an  influence  on  the  next  activity’s  procedure 
execution.  A  good  example  of  this  situation  is  found  in  activities  A7,  A8  and  AlO. 

When  the  Maintenance  team  is  established  a  knowledge  change  request  needs  to  be 
implemented,  the  procedures  of  activities  A7  and  A8  are  again  executed.  Although 
the  activities  for  these  procedures  are  termed  "Building  the  Operational  Prototype" 
and  "Final  Operational  Product”,  the  current  project  development  activity  is  known 
as  "Maintenance  and  Implementation”.  The  reason  for  taking  this  approach  is  to 
ensure  a  controlled  and  structured  development  effort,  which  makes  functional 
project  management  possible. 

Procedures  can  result  in  deliverables  such  as  the  feasibility  study! .A2,p2&p3). 

4.3  Resources 

Resources  are  those  components  in  the  methodology  that  are  being  used  by 
procedures  in  an  activity  to  allow  for  successful  completion  of  the  activity.  These 
resource  components  fonn  part  of  the  domain  in  which  MethoDex  operates. 
Resources  are  classified  into  four  basic  categories  namely  People.  Tools.  Techniques 
and  Deliverables. 

5  THE  COMPONENTS  OF  THE  MethoDex  METHODOLOGY 

Due  to  the  limitation  in  paper  size  to  acconunodate  proceeding  standards  the 
discussion  on  the  MethoDex  components  are  narrowed  down  to  activities  A4  and 
A5.  As  MethoDex  is  a  methodology  framework  it  is  suggested  that  organizations 
who  adopt  this  as  an  ES  methodology  should  customize  it  accordingly,  viewing  the 
framework  from  an  activity  point  of  view.  The  discussion  to  follow  can  serve  as  an 
example. 

ACTIVITY  A4  :  BUILDING  THE  DEMONSTRATION  PROTOTYPE 

A4  is  a  crucial  activity  in  the  methodology.  Any  of  the  succeeding  activities  are 
very  much  dependent  on  the  completion  of  this  activity. 

The  goals  of  the  demonstration  prototype  are; 

•  To  serve  as  a  vehicle  for  gathering  the  expertise  quickly  and  feasibly  This 
can  be  achieved  by  the  use  of  a  knowledge  engineering  tool  (eg.  an  ES  shell) 
which  exhibits  feasible  prototyping  (KE  Cycle)  capabilities.  The  use  of  such  a 
tool  maintains  the  momentum  of  the  analysis  and  provides  an  interactive 
development  environment. 

•  The  demonstration  prototype  must  akso  serve  as  a  model  of  the  proposed  system 
in  determining  the  scope  and  function  of  the  total  project. 


113 

•  The  demonstration  prototype  serves  as  a  model 
of  the  expertise  that  will  be  represented  or 
programmed.  The  model  thus  assists  the  KE  in 
determining  the  overall  detailed  feasibility  and 
applicability  of  the  proposed  expen  system. 

•  The  demonstration  system  also  gives  the  KE  an 
indication  of  what  type  of  knowledge 
representation  .scheme  and  tool  should  be  used, 
for  example:  rules,  objects,  frames,  semantic  or 
neural  nets 

•  The  demonstration  prototype  is  an  important 
tool  to  introduce  the  experts  to  the  technology, 
and  serves  to  demonstrate  the  extent  and 
possibilities  of  this  type  of  system. 

Procedures 

Because  of  the  above  specific  characteristics  of  the 
demonstration  prototype,  the  KE  should  focus  on  a  sub-section  of  the  expertise  that 
is  neither  too  difficult  nor  too  easy  to  represent  This  activity  should  have  a 
definite  and  realistic  time  frame  during  which  specific  objectives  and  issues  can  be 
achieved  and  assessed.  Some  iterations  of  the  KE  Cycle  will  focus  more  on  certain 
procedures.  For  example,  one  cycle  can  be  used  basically  for  testing  represented 
knowledge  that  was  analyzed  and  gathered  during  a  previous  cycle.  Any  analysis 
carried  out  during  this  activity  is  based  on  the  initial  analysis  done  in  activity  3. 

It  is  again  stressed  that  this  activity  focuses  on  the  demonstration  and  analysis 
value  of  the  prototype.  Technical  issues  like  machine  requirements,  screen  design, 
etc.  take  a  lower  priority. 

Resources. 

During  this  activity  ES,  knowledge  acquisition  and  knowledge  engineering  tools  are 
used.  Management,  the  KE  and  the  experts  are  the  main  parties  involved. 

Activity  A4  is  complete  when  the  KE  has  enough  information  to  establish  and 
achieve  the  following  deliverables: 

•  A  demonstration  of  the  ES  capabilities. 

•  An  analysis  of  the  expertise  in  the  problem  domain  in  order  to  determine  the 
knowledge  representation  scheme. 

•  A  representative  problem  from  which  Knowledge  Engineering  tool  evaluations 
and  selections  can  be  done. 

AS.  DEMONSTRATION  PROTOTYPE  REVIEW 

The  purpose  of  this  activity  is  to  analyze  the  scope  and  impact  that  the  ES  will  have 
on  two  areas:  firstly  business  issues,  eg.  culture  influences,  and  secondly  technical 
issues,  such  as  the  Knowledge  Engineering  tool  to  be  used,  system  interfaces, 
knowledge  representation  schemes  and  the  like. 

Procedures. 

The  KE  conducts  an  impact  study,  which  involves  reviewing  the  scope  and  impact 
of  the  system  in  technical  and  business  terms.  In  many  instances  it  is  found  that  the 


'  ‘ncalMge  tuaiysi^ 
'  Dpsigii 


.'L£S  IooI 

lf[  wnj  [nifrt/s 
---  k  tools 

Vr 

--  Nuttgewt 


A  ivtty  M 


114 


initial  prohleni  definition  and  domain  scope  change  and  this  ditters  trom  what  at 
was  first  perceived.  Tliis  change  can  have  a  different  impact  on  the  business  and 
technical  domain  than  was  first  envisaged. 

Resources 

The  involvement  of  senior  management  during  this 
activity  is  necessary  because  the  impact  that  the 
proposed  system  can  have  may  be  felt  throughout 
the  whole  organization  as  was  the  case  of  the 
development  of  XCON  A  report  detailing  the 
previous  activities,  leading  to  a  recommendation  to 
continue  or  abandon  the  project,  is  then  compiled. 

Information  like  feasibility  studies  and  duration 
estimates  to  management  is  crucial.  With  the 
revision  of  every  demonstration  prototype  and 
production  of  every  operational  ES,  information  is 
produced  to  update  the  strategic  plan  '5  16  for  the 
implementation  of  Knowledge  based  or  Expert 
Systems. 

Apprrval  to  continue  the  ES  project  from  senior 
management  ■  ill  be  based  on  the  strategic  plan,  the  needs  of  the  business,  economic 
considerations  and  the  reports  that  the  KE  has  produced. 

7  CONCLUSION. 

An  overview  of  MethoDex  to  develop  ESs  has  been  described.  Tliis  methodology  is 
based  on  the  conventional  SDLC  (the  revised  waterfall  model)  combined  with  a 
prototyping  approach  (the  KE  cycle) 

The  commercial  use  of  ESs  by  the  normal  IT  corporate  function  is  growing  rapidly. 
The  days  when  AI  specialists  "hacked"  away  and  produced  efficient  but  "nearly 
impossible  to  maintain"  AI  systems  are  over.  Functional  and  efficient  approaches  to 
developing  ESs  jointly  and  separately,  in  a  clear  and  manageable  way,  are 
desperately  needed.  A  great  deal  regarding  the  method  of  implementing  this 
technology  still  needs  to  be  done.  This  will  be  realised  through  experience. 
MethoDex  tiies  to  address  this  need,  to  form  a  basis  on  which  the  commercial 
systems  development  function  of  an  organization  can  build  its  Expert  Systems. 

^Mellis,  W.  Twaice:  A  Knowledge  Engineering  Tool,  Information  Systems,  Vol. 

15,  No.  1,  Nov.  (1990),  pp.  137  -  150. 

^  Maletz,  M.C.  Expert  Systems:  From  the  Research  Laboratories  to  End-User 
Deployment,  Expert  Systems,  Planning /Implementation/ Integration,  Vol.  1,  No.  1, 
Spring,  (1989),  pp.  38  -  43. 

3  Keyes  J.  Why  Expert  Systems  Fail.  AIExpert,  Vol.  4,  No.  11,  Nov.  (1989),  pp.50 
-  53. 


'll 


~r- 


k  r-^ss  *  CUtiTftl 

*  fsfcUsI  fL" 

*  [itfftiai 

!  *  loots 

jc3  1*  beff  stuay  f*^ 


1 


. /  « 

.'“k  wfigpwit,  11 

-y  lfit#rr«S  Wd  flufSttWftTfS 

DptUfd  IWClWllDi  .'f 

Mfd  r^\  ar  preteftpf 


kWtj  Hi 


115 


^  Kenny  Jih  WJ.  Comparing  Knowledge  Based  and  Transaction  Processing 
Development.  ASM/Joumal  o/Systenus  Management,  May,  (1990).  pp.  23  -  28. 

^  Hays-Roth  F.,  Waterman  D.A.,  Lenat  D.B.  Building  Expert  Systems .  Addison- 
Wesley  Reading  Massachusetts,  (1983). 

^  Sprague  K.S.  Cultivating  a  Prototyping  approach  to  Expert  Systems 
development.  Expert  Systems  Planning/Integration/Implementation,  Vol.  2,  No.  3, 
Fall  (1990),  pp.  37-  43. 

^  .‘\nar  P.  Towards  structured  Expert  Systems  development.  Expert ; ^sterns  with 
Applications.  Vol.  1,  No.  1,  (1990)  pp.  63  -  70. 

^  Jarkfc  N.,  Jeusfeld  M.,  Rose  T.  A  Software  Process  Data  Model  for  Knowledge 
Engineering  in  Information  Systems.  Information  Systems,  Vol.  15,  No.  1,  (1990), 
pp.  85-116. 

^  Boehm  B.W.  Software  Engineering  Economics.  Prentice-Hall,  Inc.,  Englewood 
Cliffs,  NY.  (1981) 

Boehm  B.W.  A  Spiral  Model  of  Software  Development  and  Enhancement. 

Computer,  May,  (1988),  pp.  61  -72. 

^  ^  Liebowitz  J.  When  is  a  Prototype  an  Expert  System  ?  Expert  Systems,  Vol.  3, 
no.  1,  Spring  (1991)  pp.  17-21. 

Maletz  M.C.  Expert  Systems:  From  the  Research  Laboratories  to  End-User 
Deployment.  Expert  Systems  Planning/Implementation/Integration,  Vol.  l,no.  1, 
Spring  (1989),  pp.  38  -43. 

Prerau  D.S.  Developing  and  Managing  Expert  Systems:  Proven  Techniques  for 
Business  and  Industry.  Addison  Wesley,  (1990),  pp.  248  -  259. 

Sviokia  J.J.  An  Examination  of  the  Impact  of  Expert  Systems  on  the  Firm:  The 
Case  of  Xcon.  MIS  Quarterly,  June  (1990),  pp.  127  -  140. 

Moulin  B.  Strategic  Planning  for  Expert  Systems.  IEEE  Expert,  Vol.  5,  no.  2, 
April  (1990).  pp.  69  -  75. 

*  ^Meador  C.L.,  Mahler  E.G.  Choosing  an  Expert  Systems  Game  Plan 
Datamation.  Aug.  1,  (1990),  pp.  64  -  69. 


Towards  Intelligent  Databases 


Frangois  Bry 

ECRO,  ArabcllastraRe  17,  81925  Miinchen  81,  Germany 
FVancois.Bry@ecrc.de 


Abstract.  This  article  is  a  presentation  of  the  objectives  and  techniques 
of  deductive  databases.  The  deductive  approach  to  databases  aims  at  ex¬ 
tending  with  intensiona/ definitions  other  database  paradigms  that  describe 
applications  exlensionallij.  We  first  show  how  constructive  specifications  can 
be  expressed  with  deduction  rules,  and  how  normative  conditions  can  be  de¬ 
fined  using  integrity  constraints.  We  outline  the  principles  of  bottom-up  and 
top-down  query  answering  procedures  and  present  the  techniques  used  for 
integrity  checking.  We  then  argue  that  it  is  often  desirable  to  manage  with 
a  datab^tse  system  not  only  database  applications,  but  also  specifications  of 
system  components.  We  present  such  meta-level  specifications  and  discuss 
their  advantages  over  conventional  approaches. 


1  Introduction 


Deductive  Databases  have  been  studied  since  more  than  a  decade.  Theoretical  issues 
have  been  investigated  (see  e.g.  [28,  29,  30,  31,  65,  21,  8,  48,  64,  17,  18,  44,  45]  for  an 
overview),  and  experimental  deductive  database  management  systems  liave  been  and 
are  still  implemented  (e.g.  [54,  9,  23,  26,  32,  34,  51,  56,  66,  33,  68,  40,  52]).  Industrial 
products  are  currently  developed  from  research  prototypes  (e.g.  [69]).  This  article  is 
informal  presentation  of  the  notions  and  objectives  of  deductive  databases.  Instead 
of  emphasizing  technical  aspects  (that  are  explained  in  a  number  of  articles  and 
tutorials,  e.g.  [28,  29,  30,  31,  65,  21,  8,  17,  18]),  we  prefer  to  insist  on  the  goals  of 
the  deductive  approach  to  da^.abases. 

A  first  part  of  the  presentation  is  devoted  to  recall  how  two  complementary  no¬ 
tions  are  used  in  deductive  databases  for  cfecfnmfiue/y  specifying  an  application.  On 
the  one  hand,  deduction  rules  are  used  for  constnictive  definitions.  On  the  other 
hand,  normative  specifications  arc  expressed  through  integrity  constraints.  We  in¬ 
formally  describe  how  deduction  rules  are  evaluated  for  answering  queries  (see  e.g. 
[17,  18,  1,  2,  4,  7,  55,  57,  60,  61,  67,  13]),  and  how  integrity  constraints  are  checked 
when  the  database  is  updated  (see  e.g.  [17,  18,  15,  25,  39,  43,  47,  49,  53,  19]). 

In  a  second  part  of  the  presentation,  we  argue  that  it  is  often  desirable  to  manage 
with  the  database  system,  not  only  an  application,  but  also  specifications  of  compo¬ 
nents  of  the  database  sytem  itself,  the  description  of  an  application,  or  various  kinds 
of  interpretations  of  this  application.  We  informally  introduce  a  few  such  meta-level 
specifications,  that  rely  on  meta-programming  [58,  62,  63,  59].  Finally,  we  briefly 
mention  further  applications  of  meta-level  specifications  towards  enhanced  database 
management  systems. 


117 


2  An  Introduction  to  Deductive  Databases 

A  main  trend  in  database  research  is  the  enhancement  of  data  modeling  facilities.  De¬ 
ductive  database  techniques  aim  at  extending  conventional,  nondeductive  databases, 
in  which  data  are  extensionally  specified,  with  intensional  definitions  in  form  of  de¬ 
duction  rules  and  integrity  constraints. 

Database  management  systems  historically  developed  from  file  managers,  in 
which  applications  are  specified  in  terms  of  records  and  structured  according  to 
storage  and  retrieval  criteria.  Two  data  models  were  proposed  at  the  end  of  the 
sixties/beginning  of  the  seventies  for  improving  the  descriptions  of  applications:  the 
hierarchical  and  the  network  data  models.  Like  a  file,  a  hierarchical  or  network 
database  consists  of  records.  However,  in  contrast  to  files,  records  are  structured  in 
trees  and  pointers  express  relationships  between  records.  Both  the  hierarchical  and 
the  network  data  models  have  a  major  drawback;  The  pointers  these  data  model 
rely  upon  make  the  design  and  the  querying  of  databases  rather  difficult.  Database 
users  must  be  aware  of  rather  complex  networks  even  for  posing  simple  queries. 

The  relational  data  model,  defined  by  Codd  [24]  at  the  end  of  the  seventies,  over¬ 
comes  this  difficulty  in  an  elegant  manner:  no  pointers  are  used  and  the  conceptual 
links  between  records  (called  tuples)  are  expressed  through  regular  data.  A  relational 
database  consists  in  a  set  of  relations.  Relations  are  set  of  tuples.  The  semantical 
relationship  between  tuples  are  expressed  through  the  values  they  contain.  Thus, 
for  example,  the  presence  of  a  same  character  string  (say,  a  name)  in  a  tuple  of  a 
“salary”  relation  and  in  a  tuple  of  an  “address”  relation  links  salaries,  addresses, 
and  employee’s  names.  Because  they  are  value-based,  relational  databases  can  be  in¬ 
terpreted  in  mathematics  as  logical  theories  consisting  of  formulas  or,  alternatively 
as  logical  models  consisting  of  relations.  Relational  databases  can  be  seen  as  more 
declarative  than  hierarchical  or  network  databases  since  less  knowledge  of  their  inter¬ 
nal  structure  is  necessary  for  querying  them.  Indeed  the  knowledge  of  the  relation’s 
names,  the  so-called  database  schema,  and,  possibly,  of  some  values  occurring  in 
tuples,  suffices  for  posing  queries. 


2.1  Deduction  Rules 

Deductive  databases  can  be  seen  as  an  extension  of  the  relational  model.  In  a  re¬ 
lational  database,  the  data  are  specified  extensionally.  That  is,  the  tuples  of  a  re¬ 
lational  database  are  explicitly  defined.  Deductive  databases  in  contrast,  also  give 
rise  to  specifying  data  intensionally  by  means  of  general  properties,  expres.scd  us¬ 
ing  deduction  rules.  Consider  for  example  the  time-table  of  the  Lufthansa  airline. 
The  Lufthansa  direct  flights  from  Munich  to  Paris  can  be  specified  by  the  following 
“flight”  relation; 


Monday 

0725 

0900 

Lll'1356 

Tuesday 

0725 

0900 

LH4356 

Wednesday 

0725 

0900 

LH4356 

Tbrsday 

0725 

0900 

LH4356 

Friday 

0725 

0900 

LH4356 

Saturday 

0725 

0900 

LH4356 

118 


Monday 

1110 

1245 

LH4384 

Tuesday 

1110 

1245 

LH4384 

Wednesday 

1110 

1245 

LII4384 

Thursday 

1110 

1245 

LM4384 

Friday 

1110 

1245 

LH4384 

The  first  attribute  (column)  of  this  relation  indicates  the  day  of  the  flight,  the  second 
and  third  are  the  departure  and  arrival  times,  respectively,  and  the  last  attribute 
is  the  flight  number.  These  eleven  flights  could  be  specified  by  the  following  two 
deduction  rules  that  somehow  “factorize”  the  data  common  to  several  tuples: 

flight(D,  0725,  0900,  lh4356)  day(D),  not  D  =  Sunday. 

flight(D,  1110,  1245,  Ih4384)  <—  day(D),  not  D  =  Saturday,  not  D  =  Sunday. 

As  usual,  character  strings  beginning  with  an  upper  case  letter  (e.g.  D)  are  use<l  for 
denoting  (logical)  variables.  The  membership  of  a  tuple  (called  “fact”  in  deductive 
databases)  “t”  in  a  relation  “r”  is  expressed  by  the  term  “r(t)”.  We  assume  that 
“day”  denotes  the  relation  containing  the  seven  days  of  the  week  (monday,  tuesday, 
etc.).  Lower  case  letters  are  used  for  distinguishing  these  constant  values  from  vari¬ 
ables.  The  expression  “day(D)”  can  be  thus  evaluated  to  the  facts  “day(monday)", 
“day(tuesday)”,  etc.  The  meaning  of  the  first  rule  is  that  the  facts  "flight(monday, 
0725, 0900,  lh4356)” ,  “flight(tuesday,  0725, 0900,  lh4356)” , ...,  “flight(saturday,  0725, 
0900,  lh4356)”  are  derivable,  i.e.  are  true  facts  in  the  database.  In  more  technical 
terms, the  variable  D  is  (implicitly)  universally  quantified.  The  first  deduction  rule 
is  thus  a  shorthand  notation  for  the  following  formula: 

V  D  [  (day(D)  AD#  Sunday)  =>  flight(D,  0725,  0900,  lh4356)  ] 

This  simple  example  illustrates  two  important  advantages  of  deductive  databases 
compared  with  relational  ones:  (1)  they  require  less  storage,  and  (2)  they  give  rise  to 
more  natural  specifications.  The  possible  size  reduction  is  sometimes  dramatic:  An 
analysis  of  the  time  table  of  the  Munich  public  transportation  shows  for  example  a 
reduction  factor  of  about  200!  Database  applications  whose  data  cannot  be  speci¬ 
fied  according  to  general  principles  do  not  benefit  as  much  of  deductive  techniques. 
Most  datab2ises  nevertheless  contain  some  data  that  were  implied  from  general  laws 
(e.g.  business  rules,  legislation,  scientific  laws,  etc.)  and  therefore  can  benefit  from 
deductive  database  techniques. 

One  could  object  that  no  deductive  techniques  are  needed  for  achieving  the  fac¬ 
torization  described  above.  This  is  true.  There  are  indeed,  for  this  example,  two 
alternative  ways  to  avoid  the  undesirable  duplication  of  data  using  relational  data 
structures.  The  first  approach  consists  in  splitting  the  original  relation  in  two  dis¬ 
tinct  relations,  the  first  one  giving  the  day  and  the  flight  number  (which  obviously 
is  a  key),  the  second  relation  giving  the  times  and  the  flight  numbers.  A  join  then 
permits  ones  to  reconstruct  the  original  relation  at  query  time.  The  second  approach 
consists  in  using  codes  like  in  the  following  table  for  expressing  on  which  days  a  flight 
is  available. 


Xe7  0725  0900  LH4356 


119 


Xe67  1110  1245  LH4384 

In  this  relation,  X  stands  for  every  day  of  the  week,  6  for  Saturdays,  7  for  Sundays, 
Xe7  for  every  day  except  on  Sundays,  and  Xe67  for  every  week  days.* 

We  argue  that  both  approaches  have  severe  drawbacks.  The  first  approacli  (the 
split  of  the  original  relation  in  two  distinct  smaller  relation)  examplifics  an  often 
criticized  (although  necessary)  practice  in  relational  database  design;  For  reasons  of 
storage  (size)  and  coherency  of  the  data  (when  updates  are  performed),  the  natural 
description  of  an  application  usually  needs  to  be  modified.  The  two  rules  given  above 
as  opposed  achieve  the  same  effect  without  compromising  the  natural  character  of 
the  specifications.  The  second  approach  (the  encoding  of  the  days  in  the  tuples)  is 
very  close  to  a  specification  by  means  of  deduction  rules.  The  difference  however  is 
that  the  encoding  is  a  notation  “unknown”  to  the  database  management  system, 
while  deduction  rules  are  “understood”  by  a  deductive  database  system  for  what 
they  are.  Such  an  encoding  is  specific  to  a  given  application  and  must  be  interpreted 
in  the  application  programs,  that  is  outside  the  database  system.  Deduction  rules  in 
contrast  give  rise  to  interpreting  intensional  knowledge  williin  the  database  system. 

Deduction  rules  can  also  be  used  in  lieu  of  relational  views.  Views  are  in  relational 
databases  means  for  expressing  predefined  queries.  One  could  for  example  define 
connecting  flights  using  a  view;  A  connecting  flight  form  A  to  B  is  defined  from  a 
flight  from  A  to  C  and  a  flight  from  C  to  B  such  that  some  conditions  on  the  departure 
and  arrival  times  in  C,  and  on  the  location  of  the  airport  C  are  satisfied.  A  recursive 
definition  give  rise  to  specifying  connections  involving  an  indefinite  number  of  flights. 
Such  a  definition  is  quite  naturally  expressed  by  the  following  deduction  rule; 

connection(D,  Tl,  T2,  [Nb])  <-  night(D,  Tl,  T2,  Nb). 
connection^,  Tl,  T2,  [Nb  (  LJ)  4-  flight(D,  Tl,  T3,  Nb), 

connection(D,  T3,  T2,  L), 
compatible(Nb,  L). 

The  first  rule  specifies  a  connection  consisting  of  one  single  flight.  The  list  of  fliglit 
involved  in  this  connection  ([Nb])  thus  contains  only  one  flight  number.  The  .second 
rule  “links”  a  flight  to  a  connection  and  extends  its  list  of  flight  numbers.  The  pred¬ 
icate  “compatible”  is  assumed  to  express  whether  times  and  airports  arc  compatible 
in  a  connection.  It  might  be  specified  intensionally  by  means  of  deduction  rules, 
or  extensional/y  by  a  relation.  Recursive  specification  are  important  in  practice  for 
specifying  several  natural  properties  that  apply  on  an  indefinite  number  of  object. 
Another  example  is  the  definition  of  a  “bill  of  material”;  the  price  of  a  complex 
object  is  obtained  by  summing  up  the  prices  of  its  parts,  whose  prices  are  in  turn 
similarly  defined.  Like  for  flight  connections,  it  is  desirable  to  have  a  specification  at 
our  disposal  which  is  not  limited  to  a  given  number  of  components  (c.g.  flights  or 
parts).  It  has  often  been  observed  that  recursive  specifications  arc  hardly  avoiclable 
in  real  life  applications. 

Deduction  rules  thus  are  very  similar  to  relational  views.  Since  the  first  relational 
database  systems  were  not  capable  of  handling  recursive  views,  deduction  rules  are 


'  This  representation  is  taken  from  the  time  table  booklet  published  by  Lufthansa. 


120 


often  seen  eis  the  extension  of  relational  views  to  recursion.  In  our  opinion,  dcrluc- 
tion  rules  are  more  than  extended  views.  Views  are  not  liandled  like  regular  data, 
i.e.  tuples  and  relations,  in  a  relational  database  management  systems,  while  deduc¬ 
tion  rules  should  be  seen  as  first  class  citizen  in  a  deductive  database  system.  This 
means  that  all  the  fauiilities  that  are  provided  by  the  system  for  storing,  retrieving, 
updating,  and  querying  extensional  specifications  (i.e.  facts)  should  also  be  appli¬ 
cable  to  intensionally  defined  data  (i.e.  data  deflned  by  deduction  rules)  and  to  tlic 
intensional  specifications  themselves.  The  full  realization  of  this  objective  is  still  the 
subject  of  active  research. 


2.2  Remarks  on  the  Language  of  Deduction  Rules 

The  deduction  rules  specifying  connecting  flights  (cf.  previous  section)  contain  com¬ 
plex,  nested  terms,  namely  lists.  It  is  often  believed  that  nested  terms  and  term 
constructors  should  be  prohibited  in  deductive  databases.  We  think  that  nested 
terms  are  needed  (as  in  the  above  example).  Moreover,  the  known  techniques  are 
(almost)  sufficient  to  acommodate  them  like  flat,  so-called  first-normal  form  facts. 
It  is  probably  the  concept  of  Datalog,  i.e.  the  language  of  rules  with  flat  terms  and 
no  negation,  which  has  widespread  the  idea  that  deductive  databases  should  only 
specify  first-normal  form  tuples. 

In  deductive  databases,  the  same  form  of  negation  is  needed  as  In  relational 
databases.  This  negation  has  been  formalized  in  various  manner  and  under  dilTcr- 
ent  names  (negation  as  failure,  non-monotonic  negation,  negation  according  to  the 
closed-world  assumption,  etc.).  Common  to  these  formalizations  is  the  basic  notion 
that  an  expression  can  be  considered  as  false  if  it  cannot  be  proved.  This  interpreta¬ 
tion  of  negation  is  a  rather  intuitive  form  of  reasoning.  This  is  this  way  of  thinking 
that  leads  us  to  conclude,  for  example,  that  there  are  no  direct  flights  from  Munich 
to  TVondheim  if  we  do  not  find  any  in  the  time  table.  Although  there  is  a  general 
agreement  on  the  semantics  of  this  form  of  negation  for  relational  databases,  it  is 
not  always  clear  how  to  formalize  it  In  deductive  databases.  Rules  like  the  following 
ones  are  difficult  to  interpret,  indeed; 

a  <—  not  b. 

b  «—  not  a. 

“a”  should  be  derivable  only  if  “b”  is  not  derivable,  and  “b”  should  be  nonderivabic 
only  “a”  is  also  nonderivable.  Various  more  or  less  complex,  more  or  less  intuitive 
proposals  have  been  made  for  giving  convincing  interpretations  to  such  examples 
(and  to  more  sophisticated  ones)  as  well  as  for  defining  query  answering  procedures 
according  to  (some  of)  these  interpretations.  The  problem  is  not  yet  completely 
solved  and  is  still  investigated.  There  is  however  a  general  agreement  on  the  semantics 
of  negation  in  so-called  stratified  deductive  databases  (or  logic  programs).  The  basic 
idea  of  stratification  is  to  partition  hierarchically  the  definitions  of  predic.atcs,  such 
that  no  predicate  definitions  refers  to  the  negation  of  a  predicate  defined  in  a  higher 
strata.  Since  one  might  have  to  deal  with  incompletely,  or  even  incorrectly  specified 
databases  -  for  example  for  debugging  at  design  time  -,  it  is  desirable  to  have  a 
semantics  (and  the  corresponding  answering  procedures)  at  our  disposal  which  docs 
not  impose  any  syntactical  restrictions  such  as  stratification. 


121 


There  is  however  a  syntactical  restriction  which  is  desirable,  that  of  range  restric¬ 
tion.  Range  restriction  basically  requires  that  any  variable  occurring  in  a  negated 
expression  in  a  query  or  in  the  body  (i.e.  the  right  hand  side)  of  a  rule  also  occurs 
in  a  unnegated,  positive  expression.  Thus,  “p{X),  not  q{X)”  is  range-restricted,  but 
“p(X),  not  q(X,  Y)”  is  not  because  the  variable  Y  has  no  (positive)  range.  Since,  due 
to  the  interpretation  of  negation,  negative  expressions  are  absent  from  the  daliil  i.sc, 
range  restriction  is  needed  for  ensuring  that  the  variables  occurring  in  a  quei  y  i  r  in 
a  rule  body  can  be  assigned  values  from  subexpreiisions  occurring  in  this  query  or 
rule. 


2.J  Integrity  Constraints 

Deduction  rules  give  rise  to  generating  new  facts  from  a  database,  i.e.  deduction 
rules  express  constructive  specifications.  In  contrast  to  deduction  rules,  integrity 
constraints  are  used  for  expressing  non-constructive,  normative  specifications.  Such 
specifications  are  needed  for  ensuring  that  some  properties  remain  satisfied  when 
data  are  updated.  The  following  integrity  constraint  for  example  states  that  no 
flights  are  allowed  to  land  after  23:00: 

V  D  T1  T2  Nb  (  night(D,  Tl,  T2,  Nb)  =>  T2  >  2300  ] 

Any  attempt  to  specify  a  flight  landing  after  23:00  would  lead  to  a  violation  of  this 
integrity  constraint.  This  violation  would  be  reported  to  the  database  user  wlio  could 
then  either  modify  the  update,  or,  if  it  appears  to  be  no  more  valid,  the  integrity 
constraint  inste2wj.  An  integrity  constraint  can  thus  be  viewed  as  a  yes/no  query 
which  is  evaluated  when  the  database  is  updated.  Integrity  constraints  are  needed 
not  only  for  specifying  negative  properties,  as  in  the  previous  example,  but  also  for 
stating  disjunctive  or  existential  conditions,  like  in  the  following  examples  stating 
that  at  least  one  of  two  flights  must  be  recorded  (i.e.  specified)  in  the  database,  and 
that  there  exists  at  least  one  day  on  which  there  is  a  flight,  respectively: 

llight(saturday,  0700,  0745,  Ih0345)  V  flight(saturday,  0735,  0810,  Ili0346) 

3  D  [  day(D)  A  night(D,  0700,  0745,  Ih0345)  J 

Although  marketed  database  management  systems  can  only  maintain  very  limited 
types  of  integrity  constraints  (if  at  all!),  normative  specifications  arc  important  in  all 
kinds  of  database  applications.  Integrity  constraints  are  expres-sed  and  maintained 
through  application  programs  in  current  databases,  that  is  outside  the  scope  of  the 
database  system.  This  is  undesirable  because  this  makes  the  specification  and  the 
maintenance  of  integrity  constraints  a  (generally  complex)  programming  task.  In 
deductive  databases,  this  is  part  of  the  database  design,  for  which  tools  should  be 
available  [16].  Integrity  constraints  are  not  declaratively  specified  but  are  expressed 
by  means  of  imperative  programs.  Moreover  these  programs  usually  combine  the 
specifications  of  the  normative  conditions  and  their  efficient  evaluation.  In  deductive 
databases  in  contrast,  one  only  has  to  specify  integrity  constraints.  Their  efficient 
evaluation  is  left  to  the  database  management  system  (cf.  Section  4  below).  This  is 
not  only  more  convenient  for  the  database  designer.  This  also  ensures  that  integrity 
constraints  are  efficiently  checked.  This  is  hardly  the  case  when  application  programs 


122 


are  modified  for  acommodating  the  modifications  of  integrity  constraints  that  arc 
unavoidable  in  any  real  life  applications. 

Range  restriction  is  needed  for  integrity  constraints  like  for  deduction  rules.  A 
universal  quantification  V  X  F(X]  is  range  restricted  if  the  expression  F[X]  is  of  the 
form  R[X)  =>  G(X]  and  if  X  appears  positively  in  R  (cf.  [10]  for  a  precise  definition). 
Thus  V  X  [  p(X)  =>  q(X)  ]  is  range  restricted,  while  V  X  [  (->  p{X))  q{X)  ]  is  not. 
An  existential  constraint  3  X  F(X]  is  range  restricted  if  F(X]  is  of  the  form  R[X]  A 
G(X]  and  if  X  appears  positively  in  R(X]  [10).  Range  restriction  ensures  that  only 
updates  affecting  expressions  occurring  in  a  constraint  (directly  or  indirectly  througli 
deduction  rules)  might  violate  this  constraint.  This  is  an  essential  condition  for  an 
efficient  integrity  checking  (cf.  Section  4).  It  is  worth  noting  that  range  restriction 
is  a  very  natural  requirement;  in  natural  languages,  it  is  almost  impossible  to  ex¬ 
press  properties  that  are  not  range  restricted.  Moreover,  formulas  that  are  not  range 
restricted  have  “semantically  equivalent”  counterparts  that  are  range  restricted. 

2.4  Constraints  as  Rules 

Deduction  rules  can  be  used  for  expressing  integrity  constraints  in  two  different 
ways.  The  first  one  consists  in  expressing  quantifiers  by  means  o'  rules,  the  sec¬ 
ond  approach,  in  rewriting  the  integrity  constraint  as  special  rules.  The  following 
deduction  rule  express  a  range-restricted  universal  quantification: 

forall(X,  R  =>  F)  not  (R,  not  F). 

Consider  for  example  the  following  universal  formula;  V  X  p(X)  =>  n(X).  It  would  be 
expressed  as  “forall(X,  p(X)  =>  q(X))"  using  the  formalisnr  defined  by  the  above 
given  rule.  This  expression  evaluates  to  true  if  and  only  if  it  is  impossible  to  satisfy 
the  conjunctive  query  “p(X),  not  q(X)”,  i.e.  to  find  a  value  X  in  the  relation  “p” 
which  is  not  also  in  the  relation  “q”.  The  deduction  rule  given  above  thus  specifies 
a  constructive  evaluation  of  range  restricted  universally  quantified  expressions  [12). 
Existential  quantifications  are  even  easier  to  express  in  the  formalism  of  deduction 
rule; 


exists(X,  F)  <—  F. 

Instead  of  relying  on  the  above  given  rules  for  quantifiers,  one  can  also  directly 
rewrite  the  integrity  constraints  as  rules.  An  integrity  constraint  C  is  expressed  a-s 
a  rule,  called  denial,  corresponding  to  “false  <—  not  C”.  The  examples  of  integrity 
constraints  given  above  lead  thus  to  the  following  denials; 

false  4-  flight(D,  Tl,  T2,  Nb),  T2  >  2200. 
false  4—  not  flight(saturday,  0700,  0745,  lh0345), 
not  flight(saturday,  0735,  0810,  lh0346). 
false  4-  not  (day(D),  flight(D,  0700,  0745,  lh0345)) 

The  two  approaches  are  in  fact  the  two  sides  of  a  same  coin.  The  second  rep¬ 
resentation  is  obtained  from  the  first  by  partial  evaluation  (or  partial  deduction) 
[42,  35,  36,  37,  41)  of  the  rules  specifying  quantifiers  in  the  integrity  constraints. 


123 


3  Query  Answering 


Queries  are  usually  answered  against  the  constructive  specifications  contained  in 
the  database,  i.e.  against  the  facts  and  deduction  rules.  Standard  query  answering 
methods  do  not  make  use  of  integrity  constraints.  Two  complementary  techniques 
can  be  applied  in  standard  query  answering;  bottom-up  (or  forward)  or  top-down  (or 
backward)  reasoning.  Dottom-up  reasoning  procedures  basically  consist  in  repeating 
the  following  as  long  as  new  facts  are  obtained:  the  bodies  of  all  rules  are  evaluated 
against  the  explicitly  stored  facts,  and  the  corresponding  facts  specified  by  the  heads 
(i.e.  the  left  hand  side)  of  the  rules  are  added  to  the  database  (in  a  special  area). 
Consider  for  example  the  following  database  which  can  be  interpreted  as  follows. 
“f(X,  Y)”  means  that  “X”  is  the  father  of  “Y”;  the  odd  (even,  resp.)  numbers  are  in 
a  father-child  relationship,  and  this  relationship  has  circles  on  letters  (“a"  and  “b” 
as  well  as  “c”  and  “d”  are  “fathers”  of  each  other);  “g(X,  Y)”  means  tliat  “X"  and 
“Y"  belong  to  the  same  generation. 

g(X,Y)  f-f(FX.X),g(FX.FY),f(FY,Y).  f(l,  3)  f(2,  4)  f(a,  b) 

g(l,  2)  f(3, 5)  f(4, 6)  f(b,  a) 

g(a,  c)  f(6, 8)  f(c,  d) 

f{d,  c) 

The  facts  “g(l,  2)”  and  “g(a,  c)”  give  rise  to  deriving  "g(3,  4)”,  “g(b,  cl)”,  and 
“g(5,  6)”  using  the  deduction  rule.  Bottom-up  reasoning  on  this  database  leads  to 
generating  these  facts  in  stages: 


Stage  1: 

g{3.  4) 

g(l>,  d) 

Stage  2: 

g(3.  4) 

g{b,  d) 

g(5.  6) 

g(a,  c) 

Stage  3: 

g{3.  4) 

g(b.  d) 

g(5,  0) 

g{a.  c) 

g(b.  d) 

The  nexi  round  derives  the  same  facts  are  those  proven  at  stage  3.  For  restricting 
the  repeated  derivation  of  already  proven  facts,  one  can  require  that  at  least  one  of 
the  facts  produced  at  the  previous  stage  is  used  in  a  proof.  This  refined  procedure  is 
called  in  the  database  community,  the  semi-naive  method,  while  the  straightforward, 
redundant  method  is  called  naive.  The  naive  and  semi-na'ive  methods  terminate  as 
soon  as  no  new  facts  are  derived.  It  is  not  possible  to  completely  avoid  a  repeated 
generation  of  some  facts,  for  a  same  fact  can  have  several  distinct  proofs.  Using 
bottom-up  reasoning  for  answering  a  query  basically  consists  in  generating  all  deriv¬ 
able  facts  from  the  datab2ise,  and  then  in  evaluating  the  query  against  the  resulting, 
extended  set  of  facts.  There  are  methods  for  restricting  to  some  extent  and  in  some 
cases  this  “blind”  generation.  However,  it  is  an  inherent  feature  of  bottom-iip  reason¬ 
ing  not  to  make  use  of  the  posed  query  in  trying  to  answer  it:  bottom-up  reasoning  is 
not  “goal  directed”.  It  is  worth  emphasizing  that  the  naVve  and  semi-na'ivc  methods 
compute  sets  at  each  stages  and  that  set-oriented  techniques  from  relational  system 
can  be  applied  for  computing  these  sets.  An  efficient  processing  of  quantifiers,  nega¬ 
tion,  and  disjunctions  that  are  frequent  in  integrity  constraints  and  deduction  rules 
requires  to  refine  over  the  traditional  techniques  of  relational  algebra  [10]. 


124 


Top-down  reasoning  procedures  overcome  this  drawback  by  reasoning  backward 
from  the  posed  query.  Consider  once  again  the  father-generation  databa.sc  given 
above  and  the  query  “g(3,  X)”  asking  for  all  Xs  that  are  in  f  e  same  generation 
as  3.  The  only  solution  is  4,  since  1  and  2  are  fathers  of  3  and  4,  respectively  and 
belong  themselves  to  a  same  generation.  Reasoning  backwards  from  the  query  ‘‘g(3, 
X)”  consists  in  selecting  a  rule  whose  head  unifies  with  the  query.  In  onr  ca.sc, 
there  is  only  one  candidate  rule.  The  unification  of  its  bead  with  the  query  binds 
the  variables  in  its  body  resulting  in  the  following  conjunctive  subquery;  f{FX,  3), 
g(FX,  FY),  f(FY,  Y).  The  first  conjunct  “f(FX,  3)”  has  one  single  solution  which 
binds  the  variable  FX  to  1.  The  next  conjunct  (or  subquery  )  to  evaluate  is  “g{l, 
FY)”.  It  can  be  answered  either  against  the  facts,  or  using  once  again  the  deduction 
rule  in  which  case  the  same  process  is  repeated. 

Top-down  reasoning  can  be  formalized  in  terms  of  bottom-up  reasoning  by  relyi  .g 
on  the  formalism  of  deduction  rules  as  follows  [13]: 

fact(X)  <—  query(X),  rule(X  t—  Y),  answcr(Y). 
query(Y)  <—  query(X),  rule(X  <—  Y). 
query(Yl)  <-  query((Yl,  Y2)). 
query(Y2)  <—  query{(Yl,  Y2),  answer(Yl). 
answer(X)  «-  query  (X),  fact(X). 

answer((Yl,  Y2))  <—  query((Yl,  Y2)),  answcr(Yl),  answer(Yl). 

Assume  that  these  rules  are  evaluated  bottom-up  and  that  the  predicate  “fact” 
(“rule”,  respectively)  range  over  the  facts  (deduction  rules,  resp.)  stored  in  the 
database.  The  first  rule  select  a  deduction  rule  the  head  of  which  unifies  with  a 
query,  and,  if  an  “answer”  (a  predicate  defined  by  other  rules)  is  found,  generates 
a  fact.  In  the  formalism  of  deduction  rules  used  here,  unification  docs  not  have  to 
be  redefined:  it  is  already  provided  by  this  formalism.  The  second  rule  generates  a 
(generally  conjunctive)  query  by  unifying  a  query  with  the  head  of  a  rule.  The  third 
and  fourth  rules  split  conjunctive  queries;  the  last  two  rules  derive  conjunctive  an¬ 
swers  by  conjuncting  already  generated  answers.  The  query  “g(3,  X)”  is  answered  as 
follows  by  processing  the  deduction  rules  given  above  with  the  semi-naVve  method: 

Stage  1:  query(  (f(ZX,  3),  g(ZX,  ZY),  f(ZY,  Y))  ) 

Stage  2:  query (  f(ZX,  3)  ) 

Stage  6:  answer(  g(3,  1)  )  query(  (f(ZX,  I),  g(ZX,  ZY),  f(ZY,  Y))  ) 
Stage  7:  query(  (g(l,  ZY),  f(ZY,  Y))  ) 

Stage  II:  answer(  g(3,  4)  ) 

The  above  mentioned,  rule-based  specification  of  top-down  reasoning  is  interest¬ 
ing  for  several  reasons.  Firstly,  since  it  is  expressed  in  terms  of  botlom-up  reasoning, 
it  is  easily  amenable  to  set-oriented  computations.  This  is  important  for  the  sake 
of  efficiency  in  databases.  Secondly,  the  above  specified  top-down  procedure  is  com¬ 
plete,  more  precisely  exhaustive;  If  there  are  finitely  many  answers,  it  computes  all 
of  them  and  terminates;  if  there  are  infinitely  many  answers  (in  presence  of  function 
symbols),  each  single  answer  in  computed  in  finite  time.  The  top-down  reasoning 


125 


procedure  generally  used  in  logic  programming,  SLD  resolution,  in  contrast  might 
loop  in  presence  of  recursive  deduction  rules.  Termination  (or  exhaustivity)  is  im¬ 
portant  in  databases,  for  database  users  as  opposed  to  programmers  cannot  be  made 
responsible  of  termination  of  the  queries  they  po.se  to  a  database.  Finally,  the  speci¬ 
fication  given  above  provides  with  a  simple  formalization  of  the  Alexander  or  Magic 
Set  rewriting  methods  (1,  2,  57,  7,  55,  4,  21,  8)'.  these  revvritings  arc  obtainable  from 
the  rule-based  specification  given  above  by  partial  evaluation  (or  partial  deduction) 
(42,  35,  36,  37,  41].  These  points  arc  discussed  in  more  detail  in  (13).  [Cl]  also  shows, 
from  a  different  angle,  that  the  Alexander  and  Magic  rewritings  in  fact  implement 
top-down  reasoning  by  means  of  deduction  rules  that  are  evaluatefi  bottom-up. 

We  would  like  to  conclude  this  section  on  deductive  database  query  answering 
methods  with  some  remarks.  Firstly,  although  “goal  directedness”  in  genr-al  is  im¬ 
portant  for  efficiency,  there  are  cases  where  the  overhead  resulting  from  generating 
and  managing  subgoals  does  not  pay  off.  In  theses  cases,  that  still  remain  to  be  fully 
characterized,  a  bottom-up  reasoning  with  the  semi-naVve  method  is  more  efficient 
than  a  top-down  procedure  like  the  above  specified  one  or  Magic  Set.  Secondly,  it  is 
in  some  cases  preferable  to  compute  all  derivable  facts  beforehand  instead  of  gener¬ 
ating  the  needed  one  for  each  query  at  query  time.  In  these  cases  as  well,  bottoin-iip 
reasoning  with  the  scmi-naivc  method  is  preferable.  Finally,  there  lias  been  propo.sals 
to  use  integrity  constraints  either  for  speeding  up  or  for  enhancing  query  answering 
(cf.  e.g.  [22,  46,  50,  14]).  These  approaches  ari  -v  promising.  They  often  give  rise 
to  more  informative  answers  than  conventional  qut  j  answering  methods. 

4  Integrity  Checking 

Integrity  constraints  can  be  sc  n  as  yes/no  que'ies  (cf.  Section  2.3).  They  can  there¬ 
fore  be  evaluated  like  regular  queries.  This  is  however  often  inefficient.  Integrity 
constraints  indeed  are  to  be  checked  only  after  updates,  and  updates  ii.siially  do  not 
affect  the  whole  of  a  database  but  only  a  limited  part  of  it.  For  the  s.,ke  of  efficiency, 
it  is  desirable  to  check  only  those  integrity  constraints  that  might  be  afTccted  by  an 
update.  Various  integrity  checking  methods  have  been  proposetl  that  all  rely  on  sim¬ 
ilar  principles.  Let  us  illustrates  the  techniques  common  to  these  so-called  “integrity 
checking”  methods  on  an  example.  Consider  an  integrity  constraint  requiring  that 
all  employees  working  for  the  sales  department  speak  English: 

V  X  [  (  empl(Xl  ^  orks-ror(X,  sales-dept)  )  =>  speaks(X,  cnglish)  ] 

Any  update  to  the  facts  and  deduction  rules  that  have  no  clTcct  on  the  predicates 
occurring  in  this  constraint  cannot  violate  it.  It  is  worth  noting  that  this  only  holds 
if  integrity  constraints  are  range  restricted.  The  insertion  of  any  fact,  say  “p(a)” 
might  violate  a  non  range  restricted  constraint  such  as  C:  V  X  q{X).  If  “a”  did  not 
occur  in  the  database  before  the  insertion  of  “p(a)”,  C  indeed  does  not  hold  aftei  the 
change.  Whether  an  update  might  affect  the  definition  of  a  relation  can  bo  specified 
using  deduction  rules  as  follows; 

potential-update(H,  Sign)  t-  rule(ll  f—  B),  potential-updale(P  Sig^'' 

potential-update((Cl,  C2),  Sign)  4—  potential-update(Cl,  Sign). 


126 


potential-update((Gl,  C2),  Sign)  <—  potential-update(C2,  Sign). 
potential-update(not  F,  Opp-Sign)  <—  potential-update(F,  Sign), 

opposite(Sign,  Opp-Sign). 

potential-update(F,  -f-)  <—  insert(F). 
potential-update(F,  -)  <—  remove(F). 

Let  us  comment  this  specification  starting  from  the  last  two  rules.  The  insertion  (re¬ 
moval,  resp.)  of  a  fact  F  induces  a  “potential-update”  on  F  with  positive  (negative, 
resp.)  polarity.  Negation  changes  the  polarity  of  a  potential  update;  For  example, 
if  “p(a)”  is  a  potential  removal,  the  negative  information  “not  p(a)”  is  potentially 
inserted.  The  second  and  third  rules  specify  that  potential  updates  of  conjuncts  in¬ 
duce  potential  updates  of  conjunctions  with  same  polarity.  The  first  rule  propagate 
potential  insertions  through  deduction  rules.  Thus,  if  “p(a)”  is  inserted,  the  con¬ 
junction  “(pa),  q(a))”  is  a  potential  insertion.  In  presence  of  a  rule  “r(X)  <—  p(X), 
q(X).”  “r(a)”  is  in  turn  a  potential  insertion. 

All  integrity  checking  method  rely  on  analyses  of  possible  (or  actual)  conse¬ 
quences  of  updates  similar  to  the  computation  of  potential  updates  which  is  specified 
above  by  means  of  deduction  rules.  This  is  quite  intuitive  when  integrity  constraints 
are  expressed  as  denials.  Integrity  checking  then  indeed  reduces  to  verifying  whether 
“false”  will  become  derivable  after  an  update.  Denials  that  cannot  give  rise  to  prov¬ 
ing  “false”  can  be  filtered  out  by  rules  like  the  above  mentioned  ones,  for  “false”  is 
derivable  after  an  update  only  if  “potential-update(false,  -f)”  holds. 

The  analyses  performed  by  the  various  integrity  checking  methods  in  some  cases 
consider,  in  other  cases  ignore  the  values  of  the  attributes.  They  sometimes  perform 
bottom-up,  sometimes  top-down  reasoning  on  the  deduction  rules,  or  on  rules  used 
for  specifying  the  integrity  constraints  (cf.  e.g.  [43,  25,  39,  15,  17,  18]  and  [19]  for 
an  overview).  Some  methods,  e.g.  [53,  25,  15],  simplify  the  integrity  constraints  with 
respect  to  updates.  Such  simplifications  can  be  formalized  as  a  partial  evaluation  (or 
partial  deduction)  [42,  35,  36,  37,  41]  of  deduction  rules  similar  to  those  specified 
above. 

In  the  rule-based  specification  of  potential  updates  which  is  given  above,  we 
assume  that  the  updates  are  specified  as  sets  specified  by  the  relations  “insert”  and 
“remove”.  It  is  worth  noting  that  these  relations  can  be  defined  intcnsionally  by 
deduction  rules  as  well  as  cxtensionally  by  means  of  facts.  One  could  for  example 
specify  an  update  by  the  following  rule: 


insert(  speaks(X,  english)  )  <—  nalionality(X,  british). 

This  rule  is  rather  similar  to  the  deduction  rule  “speaks(X,  english)  t-  nationality(X, 
british)” .  The  difference  is  that  it  forces  the  explicit  storage  of  facts  in  the  database, 
while  the  deduction  rule  for  “speaks”  does  not.  Integrity  constraints  can  as  well  be 
defined  on  the  predicates  “insert”  and  “remove”.  The  following  integrity  constraint 
for  example  forbids  to  fire  of  employees  who  work  for  the  sales  department: 

V  X  [  works-for(X,  sales-dept)  =>  ->  remove(  emp(X)  )  ] 


127 


5  Deduction  Rules  for  Specifying  System  Components 

In  the  previous  sections,  we  have  outlined  how  query  answering  and  integrity  check¬ 
ing  procedures  can  be  specified  by  means  of  deduction  rules.  The  technique, which 
was  used  is  known  as  meta-programming,  for  the  variables  in  these  deduction  rules 
do  not  range,  as  in  ordinary  rules,  over  application  data  but  instead  over  expres¬ 
sions  (i.e.  integrity  constraints  or  rules)  that  describe  the  application  datii.  We  have 
pointed  out  that  rewriting  methods  used  for  answering  queries  and  evaluating  in¬ 
tegrity  constraints  can  be  seen  as  resulting  from  the  partial  evaluation  (or  partial 
deduction)  of  rule-based  specifications.  In  this  section,  we  first  argue  that  it  is  bene¬ 
ficial  to  specify  and  implement  some  components  of  a  database  management  system 
in  this  way.  Then,  we  suggest  further  applications  of  this  approach. 

A  first  advantage  of  specifying  components  of  a  database  management  system  us¬ 
ing  deduction  rules  and  partial  evaluation  is  the  uniformity  of  the  approach.  Instead 
of  implementing  several  rewriting  methods  for,  say,  recursive  query  processing  (e.g. 
(1,  57,  2,  7,  4]),  for  simplifying  integrity  constraints  (e.g.  [43,  25,  15]),  for  query  opti¬ 
mization  (e.g.  [46,  22,  10,  11]),  etc.  one  could  generate  them  automatically  from  the 
rule-based  specifications  using  techniques  as  proposed  in  [58,  62,  42,  27,  35,  36,  37]. 

System  components  declaratively  specified  using  deduction  rules  would  probably 
be  easier  to  prove  correct  and  to  maintain  than  conventional  programs.  Moreover,  the 
very  maintenance  and  updating  tools  provided  by  the  database  management  system 
(e.g.  integrity  checking)  could  be  applied  to  maintaining  those  system  components 
that  are  specified  in  terms  of  deduction  rules. 

Specifying  system  components  using  deduction  rules  would  in  addition  contribute 
to  enhance  the  extensibility  of  the  system.  It  is  indeed  easier  to  extend  a  set  of 
deduction  rules  with  additional  rules  for  novel  functionalities  (e.g.  additional  query 
optimization  strategies)  than  to  extend  a  conventional  program. 

To  which  extent  this  approach  is  applicable  in  designing  database  management 
system  is  not  yet  known.  The  approach  we  suggest  has  however  already  been  ap¬ 
plied,  more  or  less  consciously,  in  many  system  prototypes  that  luavc  been  developed 
during  the  last  years,  e.g.  [41,  68].  From  discussions  we  had  with  designers  of  var¬ 
ious  database  system  prototypes  (e.g.  [54,  3,  23,  26,  33,  34,  40,  66,  68,  20,  69]) 
we  gained  the  impression  that  meta-programming  techniques  are  rather  widely  ap¬ 
plied,  although  often  quite  unconsciously,  in  implementing  database  systems.  The 
systematic  investigation  of  this  techniques  for  database  system  design  is,  we  think, 
a  promising  direction  of  research. 

Deduction  rules  can  also  be  used  for  specifying  data  models  and  query  languages. 
This  is  a  widespread  practice  in  logic  programming  to  specify  an  interface  model 
or  language  by  means  of  rules.  This  can  be  done  in  deductive  databases  as  well 
either  for  specifying  a  semantic  data  model  (e.g.  a  entity-relationship  model),  or 
for  specifying  a  query  language  (e.g.  a  SQL-like  language).  Rules  can  also  be  used 
for  mapping  complex  objects  on  lower  level  data  structures.  Deductive  databases 
are  often  criticized  for  being,  like  relational  databases,  valuc-ba.sed,  and  for  not 
providing  with  object  identities.  Identities  are  “logical  pointers”  that  give  rise  to 
naming  objects  [5,  6].  Extending  the  paradigm  of  logic  programming  and  deductive 
facilities  with  identities  is  a  promising  issue.  We  think,  this  is  the  key  issue  to  solve  for 
bringing  closer  together  both  paradigms  of  deductive  and  object-oriented  databases. 


128 


Deduction  rules  can  finally  also  be  used  for  interpreting  the  data  stored  in  a 
database  in  various  manners.  Rules  can  be  specified  for  various  forms  of  reasoning 
that  can  be  needed  for  some  applications  (e.g.  hypothetical  or  probabilistic  queries). 
Non-standard  query  answering  methods  (e  g.  (62,  22,  46,  14])  often  iiave  been  spec¬ 
ified  using  meta-programming  techniques. 

6  Conclusion 

This  article  has  introduced  and  discussed  the  goals  and  techniques  of  deductive 
databases.  We  outlined  how  deductive  databases  give  rise  to  dec/orahue/y  specifying 
both,  constructive  and  normative  aspects  of  an  application,  using  deduction  rules 
and  integrity  constraints,  respectively. 

We  informally  presented  bottom-up  and  top-down,  set-oriented  query  answering 
methods,  and  we  introduced  to  the  principles  upon  which  integrity  checking  methods 
are  based.  We  have  shown  that  deduction  rules  are  not  only  useful  for  specifying 
database  applications,  but  can  also  serve  to  specify  and  implement  components  of  a 
database  management  system. 

We  finally  argued  that  this  approach  is  of  interest  for  several  reasons:  It  gives 
rise  to  a  more  uniform  system  design,  system  components  implemented  this  way  are 
easier  to  maintain;  system  extensibility  is  made  easier. 

Finally,  we  suggested  further  applications  of  this  approach  towards  enhanced 
database  systems. 

References 

1.  Bancilhon,  F.,  Maier,  D.,  Sagiv,  Y.,  Ullman,  3.:  Magic  Sets  and  Other  Stange  Ways  to 
Implement  Logic  Programs.  Proc.  5th  ACM  SIGMOD-SIGART  Symp.  on  Principles  of 
Database  Systems  (1986) 

2.  Bancilhon,  F.,  Ramakrishnan,  R.;  An  Amateur’s  Introduction  to  Recursive  Query  Pro¬ 
cessing.  Proc.  ACM  SIGMOD  Conf.  on  the  Management  of  Data  (1986) 

3.  Beierle,  C.;  Knowledge  Based  PPS  Applications  in  PROTOS-L.  Proc.  2nd  Logic  Pro¬ 
gramming  Summer  School  (1992) 

4.  Been,  C.:  Recursive  Query  Processing.  Proc,  8th  ACM  SIGACT-SIGMOD-SIGART 
Symp.  on  Principles  of  Databrise  Systems  (1989)  (tutorial) 

5.  Beeri,  C.:  A  Formal  Approach  to  Object-Oriented  Databases.  Data  &  Knowledge  En¬ 
gineering  5  (1990)  (Invited  paper.  A  preliminary  version  of  this  article  appeared  in  the 
proc.  of  the  1st  Int.  Conf.  on  Deductive  and  Object-Oriented  Databases) 

6.  Beeri,  C.:  Some  Thoughts  on  the  Evolution  of  Object-oriented  Database  Concepts.  Proc. 
Gl-Fachtagung  Datenbanksysteme  in  Buro,  Technik  und  Wissenschaft  (1903) 

7.  Beeri,  C.,  Ramakrishnan,  R.:  On  the  Power  of  Magic.  Proc.  6th  ACM  SIGACT- 
SIGMOD-SIGART  Symp.  on  Principles  of  Database  Systems  (1987) 

8.  Bidoit,  N.:  Bases  de  Donnies  Deductives.  Armand  Colin  (1992)  (in  Firench) 

9.  Bocca,  J.:  On  the  Evaluation  Strategy  of  Educe.  Proc.  ACM  SIGMOD  Conf.  on  the 
Management  of  Data  (1986) 

10.  Bry,  F;  Towards  an  Efficient  Evaluation  of  General  Queries:  Quantifier  and  Disjunction 
Processing  Revisited.  Proc.  ACM  SIGMOD  Conf.  on  the  Manekgement  of  Data  (1989) 

11.  Bry,  F.:  Logical  Rewritings  for  Improving  the  Evaluation  of  Quantified  Queries.  Proc. 
Int.  Conf.  Mathematical  Fundamentals  of  Data  Base  Systems  (1989) 


129 


12.  Dry,  F.;  Logic  Programming  as  Constructivism:  A  Formalization  and  its  Applica¬ 
tion  to  Databases.  Proc.  8th  ACM-SlGACT-SIGMOD-SIGAItr  Symp.  on  Principles 
of  Database  Systems  (1989) 

13.  Dry,  F.:  Query  Evaluation  in  Recursive  Databases:  Dottoin-up  and  Top-down  Recon¬ 
ciled.  Data  ii  Knowledge  Engineering  5  (1990)  (Invited  paper.  A  preliminary  version  of 
this  article  appetued  in  the  proc.  of  the  1st  Int.  Conf.  on  Deductive  and  Object-Oriented 
Databases) 

14.  Dry,  F.;  Constrained  Query  Answering.  Proc.  Workshop  on  Non-Standard  Queries  and 
Answers  (1991) 

15.  Dry,  F.,  Decker,  H.,  Manthey,  R.:  A  Uniform  Approach  to  Constraint  Satisfaction 
and  Constraint  Satisflability  in  Deductive  Databases.  Proc.  1st  Int.  Conf.  on  Extending 
Database  Technology  (1988) 

16.  Bry,  F.,  Manthey,  R.:  Checking  Consistency  of  Database  Constraints:  A  Ix>gic2d  Basis. 
Proc.  12th  Int.  Conf.  on  Very  Large  Databases  (1986) 

17.  Bry,  F.,  Manthey,  R.:  Deductive  Databases  -  Tutorial  Notes.  6th  Int.  Conf.  on  Logic 
Programming  (1989) 

18.  Bry,  F.,  Manthey,  R.:  Deductive  Databases  -  Tutorial  Notes.  1st  Int.  Logic  Program¬ 
ming  Summer  School  (1992) 

19.  Bry,  F.,  Manthey,  R.,  Martens,  B.:  Integrity  Verification  in  Knowledge  Bases.  Proc. 
2nd  Russian  Conf.  on  Logic  Programming  (1991)  (invited  paper) 

20.  Cacacc,  F.,  Ceri,  S.,  Crespi-Reghizzi,  S.,  Tanca,  L.,  Zicari,  R.:  Integrating  Object- 
Oriented  Data  Modelling  With  a  Rule-based  Programming  Paradigm.  Proc.  ACM  SIG- 
MOD  Conf.  on  the  Management  of  Data  (1990) 

21.  Ceri,  S.,  Gottlob,  G.,  Tanca,  L.:  Logic  Programming  and  Databases.  Surveys  in  Com¬ 
puter  Science,  Springer- Verlag  (1990) 

22.  Chakravarthy,  U.S.,  Gran«,  J.,  Minkcr,  J.:  Foundations  of  Semantic  Query  Optimization 
for  Deductive  Databases.  In  [48]  (1988) 

23.  Chimenti,  D.,  Gamboa,  R.,  Krishnamurthy,  R.,  Naqvi,  S.,  Tsur,  S.,  Zaniolo,  C.:  The 
LDL  System  Prototype.  IEEE  Trans,  on  Knowledge  and  Data  Engineering  2(1)  (1990) 
76-90 

24.  Codd,  E.  F.:  A  Relational  Model  of  Data  for  Large  Shared  Data  Banks.  Comm.  ACM 
13  (1970)  377-387 

25.  Decker,  H.:  Integrity  Enforcement  on  Deductive  Databases.  Proc.  1st  Int.  Conf.  Expert 
Database  Systems  (1986) 

26.  Freitag,  B.,  Schiitz,  H.,  Specht,  G.:  LOLA  -  A  Logic  Language  for  Deductive  Databases 
and  its  Implementation.  Proc.  2nd  Int.  Symp.  on  Database  System  for  Ativanced  Ap¬ 
plications  (1991) 

27.  Gallagher,  J.:  Transforming  Logic  Program  by  Specializing  Interpreters.  Proc.  Euro¬ 
pean  Conf.  on  Artif.  Intelligence  (1986)  109-122 

28.  Gallaire,  H.,  Minker,  J.  (eds);  Logic  and  Databases.  Plenum  Press  (1978) 

29.  Gallaire,  II. ,  Minker,  J.,  Nicolas,  J.-M.  (eds):  Advances  in  Database  Theory.  Vol.  1. 
Plenum  Press  (1981) 

30.  Gallaire,  H.,  Minker,  J.,  Nicolas,  J.-M.  (eds):  Advances  in  Database  Theory.  Vol.  2. 
Plenum  Press  (1984) 

31.  Gallaire,  H.,  Minker,  J.,  Nicolas,  J.-M.  (eds):  Logic  and  Databases:  A  Deductive  Ap¬ 
proach.  ACM  Computing  Surveys  16:2  (1984) 

32.  Haas,  L.  M.,  Chang,  W.,  Lohman,  G.  M.,  McPherson,  J.,  Wilms,  P.  F.,  Lpis,  G., 
Lindsay,  B.,  Pirahesh,  H.,  Carey,  M.,  Shekita,  E.:  Slarburst  Mid-Flight:  As  the  Dust 
Clears.  IEEE  Trans,  on  Knowledge  and  Data  Engineering  (1990)  143-160 


130 


I 

33.  Jarke,  M.,  Jeusfeld,  M.,  Rose,  T.:  Software  Process  Modelling  as  a  Strategy  for  KBMS 
Implementation.  Proc.  1st  Int.  Conf.  on  Deductive  and  Object-Oriented  Datal>ases 
(1989) 

34.  Kieman,  G.,  de  Maindreville,  C.,  Sir-.on,  E.;  Making  Deductive  Databases  a  Practical 
Technology:  A  Step  Forward.  Proc.  ACM  SIGMOD  Conf.  on  the  Management  of  Data 
(1990) 

35.  Komorowski,  J.:  Partial  Evaluation  -  Tutorial  Notes.  North  Amer.  Conf.  on  Logic 
Programming  (1989) 

36.  Komorowski,  J.:  Synthesis  of  Program  in  the  Framework  of  Partial  Deduction.  Tech¬ 
nical  Report  TR-81,  Computer  Science  Depart.  Abo  Akademi,  Finland  (1989) 

37.  Komorowski,  J.:  Towards  Synthesis  of  Programs  in  the  Framework  of  Partial  Deduc¬ 
tion.  Proc.  Workshop  on  Automating  Software  Design.  Xlth  Int.  Joint  Conf.  on  Artif. 
Intelligence  (1989) 

38.  Komorowski,  J.:  Towards  a  Programming  Methodology  Founded  on  Partial  Deduction. 
Proc.  9th  European  Conf.  on  Artif.  Intelligence  (1990)  404-409 

39.  Kowalski,  R.  Sadri,  F.,  Soper,  P.;  Integrity  Checking  in  Deductive  Databases.  Proc. 
13th  Int.  Conf.  on  Very  Large  Databases  (1987) 

40.  Lefebvre,  A.,  Vieille,  L.:  On  Query  Evaluation  in  the  DedGin*  System.  Proc.  Isl  Int. 
Conf.  on  Deductive  and  Object-Oriented  Databases  (1989) 

41.  Lei,  L.,  Moll,  G.-H.,  Kouloumdjian,  J.;  A  Deductive  Database  Architecture  Based  on 
Partial  Evaluation.  SIGMOD  Records  19(3)  (1990)  24-29 

42.  Lloyd,  J.,  Shepherdson,  J.  C.:  Partial  Evaluation  in  Logic  Programming.  Jour,  of  Logic 
Programming  11  (1991)  217-242 

43.  Lloyd,  J.  W.,  Sonenberg,  E.  A.,  Topor,  R.  W.:  Integrity  Constraint  Checking  in  Strat¬ 
ified  Databases.  Jour,  of  Logic  Programming  1(3)  (1984) 

44.  Lloyd,  J.  W.,  Topor,  R.  W.:  A  Basis  for  Deductive  Database  Systems.  Jour,  of  Logic 
Programming  2(2)  (1985) 

45.  Lloyd,  J.  W.,  Topor,  R.  W.:  A  Basis  for  Deductive  Database  Systems  II.  Jour,  of  l.<ogic 
Programming  3(1)  (1986) 

46.  Lobo,  J.,  Minker,  J.:  A  Metaprogramming  Approach  to  Semantically  Optimize  Queries 
in  Deductive  Databases.  Proc.  2nd  Int.  Conf.  Expert  Database  Systems  (1988) 

47.  Martens,  B.,  Bruynooghe,  M.:  Integrity  Constraint  Checking  in  Deductive  Databases 
Using  a  RuIe/GotJ  Graph.  Proc.  2nd  Int.  Conf.  Expert  Databtise  Systems  (1988) 

48.  Minker,  J.  (ed.):  Foundations  of  Deductive  Databases  and  Logic  Programming.  Morgan 
Kaufmann  (1988) 

49.  Moerkotte,  Karl,  S.:  Efficient  Consistency  Control  in  Deductive  Databases.  Proc.  2nd 
Int.  Conf.  on  Database  Theory  (1988) 

50.  Motro,  A.:  Using  Integrity  Constraints  to  Provide  Intensional  Responses  to  Database 
Queries.  Proc.  15th  Int.  Conf.  on  Very  Large  Databases  (1989) 

51.  Morris,  K,,  Ullman,  J.  D.,  Van  Gelder,  A.:  Design  Overview  of  the  NAIL!  System.  Proc. 
3rd  Int.  Conf.  on  Logic  Programming  (1986) 

52.  Naqvi,  S.,  Tsur,  S.:  A  Logical  Language  for  Data  and  Knowledge  Bases.  Computer 
Science  Press  (1989) 

53.  Nicolas,  J.-M.:  I»gic  for  Improving  Integrity  Checking  in  Relational  Databa.ses.  Acta 
Informatica  18(3)  (1982) 

54.  Nicolas,  J.-M.,  Yazdanian,  K.:  Implantation  d’un  Systeme  Deductif  sur  une  Base  de 
Donn^  Relationnelle.  Research  Report,  ONERA-CERT,  Toulouse,  France  (1982)  (in 
French) 

55.  Ramakrishnan,  R.:  Magic  Templates:  A  S|>ellbinding  Approach  to  Logic  Programming. 
Proc.  5th  Int.  Conf.  and  Symp.  on  Logic  Programming  (1988) 


131 


56.  Ramakrishnan,  R.,  Srivastava,  D.,  Sudarshan,  S.;  CORAL:  Control,  Relation  and  Logic. 
Proc.  Int.  Conf.  on  Very  Large  Databases  (1992) 

57.  Rohmer,  J.,  Lescoeiir,  R.,  Kerisit,  J.-M.:  The  Alexander  Method.  A  Technique  for  the 
Processing  of  Recursive  Axioms  in  Deductive  Databases.  New  Cencration  Computing 
4(3)  (1986) 

58.  Safra,  S.,  Shapiro,  E.:  Meta-interpreters  for  Real.  Information  Processing  86.  North- 
Holland  (1986)  271-278 

59.  Sakama,  C.,  Itoh,  II.:  Partial  Evaluation  of  Queries  in  Deductive  Databases.  New  Gen¬ 
eration  Computing  0  (1988)  249-258 

60.  Schmidt,  H.,  Kiessling,  W.,  Gunther,  H.,  Bayer,  R.:  Compiling  Exploratory  and  Goal- 
Directed  Deduction  Into  Sloopy  Delta- Iteration.  Proc.  Symp.  on  lx>gic  Programming 
(1987) 

61.  Seki,  H.:  On  the  Power  of  Alexander  Templates.  Proc.  8th  ACM  SIGACT-SIGMOD- 
SIGART  Symp.  on  Principles  of  Database  Systems  (1989) 

62.  Sterling,  L.  S.,  Beer,  R.  D.:  Meta-interpreters  for  Expert  System  Construction.  Tech¬ 
nical  Report  TR  86-122,  Center  for  Automation  and  Intelligent  System  Research,  Case 
Western  Reserve  Univ.  (1986) 

63.  Takuchi,  A.,  Flirukawa,  K.:  Partial  Evaluation  of  Prolog  Programs  and  its  Application 
to  Meta  Programming.  Information  Processing  86.  North-Holland  (1986)  415-420 

64.  Tsur,  S.:  A  (Gentle)  Introduction  to  Deductive  Datab.ases.  Proc.  2nd  Int.  Logic  Pro¬ 
gramming  Summer  School  (1992) 

65.  Ullman,  J.  D.:  Principles  of  Database  and  Knowledge-Base  Systems.  Vol.  1  and  2. 
Computer  Science  Press.  (1988,  1989) 

66.  Vaghani,  J.,  Ramamohanarao,  K.,  Kemp,  D.,  Somogyi,  Z.,  Stuckey,  P.:  The  Aditi  De¬ 
ductive  Database  System.  Proc.  NACLP  Workshop  on  Deductive  Database  Systems 
(1990) 

67.  Vieille,  L.;  Recursive  Query  Processing:  The  Power  of  Logic.  Theoretical  Computer 
Science  69(1)  (1989) 

68.  Vieille,  L.,  Bayer,  P.,  Kiichenhofl,  Ivcfebvre,  A.:  EKS-Vl:  A  Short  Overview.  Proc. 
AAAl-90  Workshop  on  Knowledge  Base  Management  Systems  (1990) 

69.  Vieille,  L.:  A  Deductive  and  Object-Oriented  Database  System:  Why  and  How?  Proc. 
ACM  SIGMOD  Conf.  on  the  Management  of  Data  (1993) 


Combining  Classification  and 
Nonmonotonic  Inheritance  Reasoning: 
A  First  Step* 


Lin  Padgham  and  Bernhard  Nebel 

*  Department  of  Computer  and  Information  Science 
Linkoping  University,  S-581  83  Linkoping,  Sweden 
^  German  Research  Center  for  Artificial  Intelligence  (DFKl) 
Stuhlsatzenhausweg  3,  D-6600  Saarbrucken  1 1 ,  Germany 
linpa@ida.liu.se  nebel@dfki.uni-sb.de 


Abstract.  The  formal  analysis  of  semantic  networks  and  frame  systems 
led  to  the  development  nonmonotonic  inheritance  networks  und  termi¬ 
nological  logics.  While  nonmonotonic  inheritance  networks  formalize  the 
notion  of  default  inheritance  of  typical  properties,  terminological  logics 
formalize  the  notion  of  defining  concepts  and  reasoning  about  definitions. 
Although  it  seems  to  be  desirable  to  (re-)unify  the  two  approaches,  such 
an  attempt  has  not  been  made  until  now.  In  this  paper,  we  will  make  a 
first  step  into  this  direction  by  specifying  a  nonmonotonic  extension  of 
a  simple  terminological  logic. 


1  Introduction 

The  formal  analysis  of  early  semantic  network  and  frame  formalisms  led  to  the 
development  of  two  different  families  of  knowledge  representation  formalisms, 
namely,  nonmonotonic  inheritance  networks  [5]  and  terminological  logics  [12]. 
Nonmonotonic  inheritance  networks  formalize  the  idea  of  default  inheritance  of 
typical  properties.  Terminological  logics  aim  at  formalizing  the  idea  of  defining 
concepts  and  reeisoning  with  such  definitions,  for  instance,  determining  subsump¬ 
tion  relationships  between  concepts  and  instance  relationships  between  objects 
and  concepts — two  kinds  of  inferences  we  will  collectively  refer  to  as  classifica¬ 
tion. 

Although  these  two  forms  of  representation  and  reasoning  may  seem  to  be 
incompatible  [2],  it  would  of  course  be  desirable  to  combine  them.  From  the  point 
of  view  of  nonmonotonic  inheritance  networks,  it  would  be  interesting  to  have  a 
richer  description  language  for  specifying  classes  and  properties  and  to  add  the 
ability  of  classifying  objects  as  belonging  to  some  claiss.  f'rom  the  point  of  view 

*  This  work  has  been  supported  by  the  the  Swedish  National  Board  for  Technical 
Development  (STU)  under  grant  #  9001669.  by  the  Swedish  Research  Council  for 
Engineering  Sciences  under  grant  #  900020,  by  the  German  Ministry  for  Research 
and  Technology  (BMFT)  under  research  contract  ITW  8901  8,  and  by  the  European 
Community  as  part  of  the  ESPRIT  Working  Group  DRUMS-ll. 


133 


of  terminological  logics,  it  is  desirable  to  add  forms  of  reasoning  that  deal  with 
uncertain  information.  In  fact,  Doyle  and  Patil  [3]  argue  that  a  representation 
system  without  such  a  facility  is  useless. 

There  are  proposals  to  integrate  some  form  of  default  inheritance  in  termi¬ 
nological  logics  since  1981  (see  [14,  12])  and  some  terminological  representation 
systems  support  forms  of  nonmonotonic  inheritance,  which  appear  to  combine 
the  two  modes  of  reasoning  in  a  “naive”  way  (e.g.  [12]),  however,  leading  to 
problems  similar  to  the  infamous  “shortest  path  inference,”  as  we  will  see  in 
Section  3. 

An  attempt  to  combine  classificatory  reasoning  and  nonmonotonic  inheri¬ 
tance  that  avoids  the  latter  problem  has  been  made  by  Horty  and  Thomason 
[4].  Although  this  approach  comes  closest  to  our  intention  of  combining  non¬ 
monotonic  inheritance  and  classification,  there  are  some  problems,  for  instance, 
the  “zombie  path”  problem  [7],  the  lack  of  an  algorithm,  and  the  computational 
intractability  of  the  approach. 

More  recent  approaches  combine  classificatory  and  nonmonotonic  reasoning 
by  integrating  default  logic  into  terminological  logics  [1] — without  using  speci¬ 
ficity  for  conflict  resolution,  though — or  they  employ  a  form  of  preference  se¬ 
mantics  [13]. 

We  will  base  our  combination  of  inheritance  reasoning  and  classification  on 
the  nonmonotonic  inheritance  reasoning  approach  by  Padgham  [10],  which  avoids 
the  above  mentioned  shortcomings.  In  the  following  sections  we  introduce  a  re¬ 
stricted  terminological  logic  extended  by  defaults,  and  discuss  how  the  inheri¬ 
tance  theory  in  [10]  can  be  extended  to  include  classification. 

2  A  Common  Representational  Base 

In  order  to  describe  our  approach,  we  first  introduce  a  representation  formal¬ 
ism  that  can  be  conceived  as  a  restricted  terminological  logic. 

We  start  with  a  set  A  of  atomic  concepts  (denoted  by  A,A', . . .)  and  a  set 
F  of  features  (denoted  by  F,  F' , . . .)  that  are  intended  to  denote  single-valued 
roles.  Additionally,  we  assume  a  set  V  of  values  (denoted  by  v,v')  that  are 
intended  to  denote  atomic  values  from  some  domain.  Based  on  this,  complex 
concept  expressions  (denoted  by  C,  C')  can  be  built: 

C  T  I  1 1  A  I  C  n  C'  I  F:  u. 

In  order  to  define  new  concepts  completely  or  partially,  terminological  axioms 
(denoted  by  6)  are  used.  Assertions  (denoted  by  o)  are  employed  to  specify 
properties  of  objects  (x,y,  z,...e  O): 

0  ACC\A  =  C,a  —  x:C\  F{x)  =  v. 

Knowledge  bases  are  sets  of  such  terminological  axioms  and  as.sertions. 

The  semantics  of  this  language  is  given  in  the  usual  set-theoretic  way.  An 
interpretation  Z  is  a  tuple  (I>,  V,  •^),  where  P  and  V  are  arbitrary  non-empty 
sets  that  are  disjoint,  and  is  a  function  such  that 

■^  ( A  —  2^)  U  (F  u  (V  —  V)  U  (O  —  P) 


134 


where  we  assume  that  the  relation  denoted  by  a  feature  is  a  partial  function 
and  that  values  and  object  identifiers  satisfy  the  unique  name  assumption.  In¬ 
terpretations  are  extended  to  complex  concept  expressions  in  the  usual  way,  e.g., 
{Cnc'f  =  c^nc'^  and  (F:vf  =  {deV\(d,v^)  e  F^}. 

An  interpretation  is  called  a  model  of  a  knowledge  base,  if  all  terminological 
axioms  and  aissertions  are  satisfied  by  the  interpretation  in  the  obvious  way,  e.g., 
for  6  =  (A  =  C).  The  specialization  relationship  between  concepts 
(also  called  subsumption)  and  the  instance  relationship  between  object  identifiers 
and  concepts  are  defined  in  the  obvious  way.  A  concept  C  is  subsumed  by  C', 
written  C  :<  C  iff  C  C'^  for  all  models  J  of  the  knowledge  base.  An  object 
X  is  an  instance  of  a  concept  C,  written  x:  C,  iff  for  all  models  J  it  holds  that 

G  C^. 

In  order  to  express  that  an  instance  of  a  concept  C  typically  has  some  ad¬ 
ditional  properties,  the  syntax  of  terminological  axioms  is  extended  as  follows; 

6  AQCIDx,...,D^  I  A^CfDu..,D„, 
where  the  Di's  re  again  concept  expressions.  These  “default  properties”  do  not 
influence  the  set-theoretic  interpretation  of  concepts,  but  are  intended  to  denote 
that  an  instance  of  A  typically  has  the  additional  properties  D,.  In  terms  of 
Padgham’s  [10]  type  model,  given  an  axiom  A  C  C/Di,  C  represents 

the  core  of  a  type,  while  C  D  Di  (1 . . .  n  £)„  is  the  default  of  a  type. 

Using  our  simple  representation  formalism,  we  could  classify  concepts  in  the 
TBox,  compute  instance  relationships  between  objects  and  concepts,  and  sep¬ 
arately  apply  default  inheritance  in  order  to  derive  typical  information  about 
objects.  In  fact,  this  loose  combination  of  classification  and  default  inheritance 
v/as  used  profitably  in  a  medical  diagnosis  application  [15]. 

The  network  language  we  will  use  contains  strict  inheritance  links  “=>•” ,  strict 
negative  links  and  default  inheritance  links  In  addition  to  the  usual 

kind  of  nodes,  depicted  by  a  letter,  we  also  allow  for  defined  nodes,  depicted  by  an 
encircled  letter.  The  latter  nodes  are  assumed  to  be  defined  by  the  conjunction 
of  all  nodes  that  are  reachable  by  a  single  strict  positive  link.  As  an  example  let 
us  consider  the  following  small  knowledge  base  (inspired  by  [2]): 

Elephant  C  Mammal/legs:  4,  color:  grey 
Hepatitis-Elephant  =  Elephant  fl  inlected-by:  Hepatitis/color:  yellow 
Yellow-Elephant  =  Elephant  n  color:  yellow 
x: Elephant 

infected-by(x)  =  Hepatitis 

Using  the  abbreviations  M,  E,  H,  and  Y  for  Mammal,  Elephant, 
Hepatitis-Elephant,  and  Yellow-Elepiiamt,  respectively,  and  g,  h,  I,  y  for 
color:  grey,  inf  ected-by:  Hepatitis,  legs;  4,  and  color:  yellow,  respectively, 
the  network  diagram  corresponding  to  our  small  knowledge  base  would  look  like 
as  in  Figure  1. 


135 


3  Some  Problems  Combining  Defaults  with  Classification 

Using  concept  specificity  for  resolving  conflicts  among  contradicting  typical 
properties  seems  to  be  natural  and  desirable.  Indeed,  most  proposals  or  already 
implemented  systems  seem  to  prefer  this  kind  of  conflict  resolution.  MacGregor, 
for  instance,  integrated  a  facility  for  “specificity-based  defaults”  [6,  p.  393]  into 
LOOM.  The  proposal  by  Pfahringer  [12]  is  also  an  effort  in  this  direction,  em¬ 
ploying  a  form  of  skeptical  inheritance  as  defined  in  the  area  of  nonmonotonic 
inheritance  reasoning.  From  the  limited  descriptions  of  such  approaches  in  the 
literature  they  appear  to  combine  the  two  modes  of  reasoning  in  what  we  will 
call  a  “naive”  way,  which  can  be  described  as  follows.  Given  an  object  x  an^'  a 
description  Dq  of  x,  we  first  determine  the  set  of  most  specialized  concepts  i 
such  that  X  is  an  instance  of  all  of  the  concepts  in  S.  Based  on  this  we  determine 
additional  typical  properties  of  x  using  some  inheritance  strategy,  which  gives 
us  a  new  (more  specialized)  description  Di,  and  we  start  the  cycle  again.  We 
stop  when  a  fixpoint  is  reached,  i.e.,  Di  is  equivalent  to  Di-\. 

The  main  problem  with  the  “naive”  approach  is  that  it  leads  to  results  re¬ 
sembling  the  infamous  shortest  path  inference.  This  can  be  illustrated  by  con¬ 
sidering  figure  2.  If  we  begin  by  classifying  x  we  get  B.  Default  reasoning  then 
gives  6,  c,  d.-'a  and  a  further  round  of  classification  gives  C.  we  would  now  want 
by  default  to  believe  a,  but  this  is  blocked  because  we  believe  h  and  ^a.  However 
we  observe  that  the  default  belief  in  a  comes  from  a  more  specific  type  (G), 
than  the  default  belief  in  6  (which  comes  from  E).  We  would  therefore  prefer  to 
believe  a  than  h.  However  we  have  previously  commited  to  6  because  we  reached 
it  first. 

4  A  Default  Inheritance  Reasoning  Framework 

In  this  section  we  develop  a  formal  framework  for  default  inheritance  rea¬ 
soning.  We  will  then  generalize  this  in  the  following  section  so  that  it  becomes 
a  framework  for  combined  classification  and  default  reasoning.  The  framework 
that  we  develop  is  based  on  that  presented  in  [10,  11].  The  theory  is  very  close 
to  the  skeptical  inheritance  theory  of  Horty  et  al  [5]  in  terms  of  the  conclusions 
reached^,  but  instead  of  working  with  constructible,  preempted  and  conflicted 

®  It  does  not  however  have  the  “zombie  path”  behavior  criticized  by  Makinson  and 
Schlechta  [7]. 


136 


paths,  we  work  with  notions  of  default  assumptions,  conflicting  assumptions  and 
modification  of  assumptions  in  order  to  resolve  conflicts. 

Given  some  initial  information  and  an  inheritance  net,  we  first  assemble  all 
the  default  aissumptions  that  may  be  possible,  given  this  start  point.  We  then 
find  all  the  pairwise  conflicting  assumptions,  and  resolve  the  conflicts — starting 
with  most  specific  nodes — by  modifying  one  or  both  of  the  assumptions.  Finally 
we  add  all  our  modified  (and  now  consistent)  assumptions  together  to  obtain 
our  set  of  conclusions. 

Our  formalization  is  based  on  an  inheritance  network,  F,  which  is  derivable 
directly  from  terminological  axioms  and  assertions  as  defined  in  Section  2,  and 
labellings  which  are  mappings  from  the  nodes  in  the  network  (the  set  N/’)  to 
values  in  the  set  {0, 1,— l,ib}.  The  intuitive  interpretation  of  such  a  labelling 
L  is  an  information  state  concerning  a  hypothetical  object  where  L(X)  =  1 
means  that  the  object  is  an  instance  of  the  concept  A',  L(X)  =  —1  means 
that  the  object  is  not  an  instance,  L(X)  =  0  means  there  is  no  information 
concerning  the  instance  relationship  and  L(X)  —  k  means  there  is  contradictory 
information.  The  set  {0, 1,-1,^}  forms  a  lattice  w.r.t.  information  content  such 
that  0  <  1  <  it,  and  0  <  -1  <  ^t.  Similarly,  the  set  of  all  labellings  forms  a 
lattice  based  on  this  ordering.  In  particular  the  join  of  two  labellings,  written 
L\UL2,  corresponds  to  the  combination  of  the  information  content.  The  special 
labelling  0  is  the  labelling  with  all  labels  0. 

A  labelling  L  is  said  to  be  consistent  if  it  does  not  contain  any  node  with  a 
value  of  k.  A  pair  of  labellings  is  said  to  be  compatible  if  their  join  is  consistent, 
and  weakly  compatible  if  their  join  does  not  introduce  any  new  inconsistency  not 
present  in  one  of  the  individual  labellings. 

Definition  1  There  is  a  strict  positive  path  from  X  to  Y  tn  F,  written 
A  =>  <T  Y,iff  exists  W,Z:  (X^W  V  [X  ^  a  ^  W]  e  F)  A  [Z  p  W]  e  F  A 
{Y=ZV  [Y=>a=>  W]^F) 

Definition  2  There  is  a  strict  negative  path  from  X  to  Y  in  F,  written 
X^tr^  Y,iff[X^  Yje  FY(3W:  fX  p  Wj  €  F  A  fY  ^  a  ^  W]  e  F). 

We  define  two  particular  kinds  of  labellings — core  labellings  (written  A'c) 
and  default  labellings  (written  Xd)  for  a  node  X.  A  core  labelling  represents 
the  necessary  information  for  a  node,  while  a  default  labelling  for  a  node  A' 
represents  the  information  typically  associated  with  X. 

Definitions  A  core  labelling  for  X  (w.r.t.  F),  written  Xc,  is  the  minimal  la¬ 
belling  which  fulfills  the  following: 

Xc(X)  >  1;  and  for  all  Y 

([x=>a^  Y]er)^  (Xc(Y)  >  1)  A  ([Xi^a^  Y]£F)-^  (X.fY)  > -1) 

Definition  4  A  default  labelling/or  A  (w.r.t.  F),  written  Xd.  is  the  minimal^ 
labelling  which  fulfills  the  following: 

Xd  >  Ac;  and  for  all  Y  ([X  -*  Y]£F)-^  (Xd  >  Y,) 

^  The  ordering  over  labellings  is  the  obvious  one,  given  the  ordering  over  node  values. 


137 


Referring  back  to  Figure  1,  the  core  labelling  for  Hepatitis-Elephant 
would  have  values  o[  {H  =  1,  h  =  1,  E  =  I,  M  =  1  ail  else  —  0}.  Its  de¬ 
fault  labelling  would  contain  {H=  I,  h  —  l,E  =  l,M  =  l,y  =  l,  g=-l,  all 
else  =  0). 

Referring  again  to  Figure  1,  we  may  wish  to  block  that  part  of  Ed  (default 
elephant  assumption)  which  concludes  g  (grey),  but  allow  a  modified  assumption 
which  concludes  /  (four  legs).  On  the  basis  of  default  labellings  we  introduce  the 
notion  of  modified  assumption  (written  Xd')- 

We  define  a  correct  modified  assumption  which  intuitively  allows  removal  only 
of  entire  branches  from  the  full  default  assumption.  Correctness  ensures  both 
consistency  w.r.t.  the  network  and  also  that  (potentially)  dependent  properties 
are  treated  together. 

Figure  3  gives  some  examples  of  correct  and  incorrect  modified  assumptions.^ 


4 1'  /■ 

'Ma' 

'1 1  /•  4  [ 

“^14  °  'Ma“’ 

full  default 

some  correct  modified  assumptions 

4 

Ma 

0l 

4 1  /■ 

■'nIa' 

some  incorrect  modified  assumptions 

Figure  3:  Correct  and  Incorrect  Assumption  Modifications 

Definitions  A  modified  assumption,  Xd' ,  is  correct  iff  the  following  condi¬ 
tions  hold  for  all  Y  in  f: 

1.  ([X  Y]e  E)^  ((Xd'(Y)  =  I) Xd'  >  Fc 

2.  Xd'(Y)  =  /  -  (3Z.-  [X  -^Z]^r  N  Zc(Y)  =  1)  X,(Y)  -  / 

3.  Xd'(Y)  =  -\  (3Z:  [X  Z]  e  r  A  Xd'(Z)  =  1  A  Z,(Y)  ^  -i;  V  Xr(Y) 

=  -1. 

4.  Xd'  >  X,. 

When  an  assumption  is  modified  it  is  always  modified  with  respect  to  some 
other  information  with  which  it  is  in  conflict.  We  thus  introduce  the  notion  of  a 
modified  assumption  as  a  pair  of  labellings  consisting  of  the  default  2issumption 
labelling  for  the  node  and  a  preference  labelling  for  the  node  (written  P).  The 
preference  labelling  captures  all  of  the  information  which  is  to  be  preferred  over 
the  default  assumption  at  that  type.  While  it  can  in  principal  be  an  arbitrary 
labelling  the  preference  labelling  will  for  all  interesting  theories  depend  on  both 


^  The  concept  of  correctness  is  further  motivated  and  explained  in  [10,  p.  188-189], 
where  it  is  dealt  with  as  two  separate  concepts  -  groiindedness  and  consistency. 


138 


the  type  network  and  the  initially  given  information.  There  is  no  constraint  on 
the  preference  labelling  to  be  consistent. 

The  value  of  the  preference  labelling  for  a  node  determines  the  modified 
assumption  for  that  node.  If  the  preference  labelling  for  a  node  A'  is  not  weakly 
compatible  with  Xc  (indicating  preferred  disbelief  in  the  concept),  then  the 
modified  assumption  will  be  empty.  Otherwise  the  modified  assumption  is  a 
labelling  between  the  core  and  the  default,  w.r  t.  information  content. 

Definition  6  A  modified  assumption  A>  =  (A'j,  P)  is  0  iff  Xe  is  not 
weakly- compatible  with  P:  otherwise  Xj-  is  the  maximal  labelling  that  is  weakly- 
compatible  with  P,  IS  a  correct  modification  of  Xi  and  Xc  <  Xw  <  A'^. 

By  joining  a  set  of  modified  assumption  labellings  for  a  given  network  we 
can  obtain  a  conclusion  labelling  for  that  network.  We  want  to  ensure  that 
the  preference  labellings  modify  the  default  assumptions  sufficiently  to  remove 
all  conflicts  so  that  we  can  obtain  a  consistent  conclusion  labelling.  Preference 
labellings  will  be  determined  by  the  structure  of  the  taxonomy  together  with  the 
initial  information,  using  principles  such  as  specificity. 

Each  node  in  the  network  has  its  own  preference  labelling.  We  call  t  his  collec¬ 
tion  of  preference  labellings  a  preference  map,  written  0.  For  a  given  network  /'. 
and  a  given  initial  labelling  tl\  fhe  preference  map  provides  a  preference  labelling 
0x  fot  each  node  A'  in  N/-.  Different  inheritance  theories  can  be  compared  with 
respect  to  the  characteristics  of  their  preference  map.  We  will  first  characterize 
what  we  call  a  well-formed  preference  map,  which  can  then  be  used  as  a  base 
for  defining  a  preference  map  for  different  kinds  of  theories,  e  g.  skeptical  and 
credulous  preference  maps. 

The  characteristics  that  we  capture  in  the  definition  of  a  well-formed  prefer¬ 
ence  map  are  that  initial  information  is  always  preferred  over  default  assump¬ 
tions,  more  specific  information  is  always  preferred  over  less  specific,  unless  the 
more  specific  information  is  unsupported,  and  that  only  supported  (or  reachable) 
modified  Eissumptions  are  non-empty. 

Definition  7  A  preference  map  0  is  well-formed  for  a  ntheork  F  and  an 
initial  labelling  r/>,  iff  the  following  conditions  are  satisfied: 

1.  ex{X)  <  k,  for  atlX  €  Nr, 

2.  0x  >  ’4’,  for  all  X  €  Nr, 

3.  there  exists  a  strict  partial  ordering  such  that  for  alt  X  €  Nr-'  tf0x{X)  < 
1,  then  i/’(X)  =  1  or  there  is  a  Y  £  Nr  s.t.  Y  -C  X  and  Yw(X)  =  1, 

if  X  is  more  specific^  than  Y ,  then  Xj'  ^  0Y- 

0^  denotes  the  minimal  well-formed  preference  map.' 

To  characterize  a  skeptical  preference  map  we  require  in  addition  to  well- 
formedness  that  each  pair  of  modified  assumptions  are  either  compatible  under 

®  For  an  exact  definition  of  the  notion  of  specificity  used  see  [10,  142]. 

’’  Proof  of  the  existence  of  a  unique  minimal  well-formed  preference  map  is  due  to 
Ralph  Ronnquist,  and  can  be  found  in  [10]  (where  it  is  called  a  revision  function). 


139 


well-fonned  prefsrence,  or  that  the  preference  labelling  for  each  includes  the 
oth“r  (forcing  modification  of  each  w.r.t.  the  other). 

Definition  8  A  preference  map  O  is  skeptical  for  a  network  F  and  an  initial 
labelling  ip,  iff  ti  is  well-formed  and  foi  all  X,Y  G  N/-.  ( A'd,  )  U  ^ )  is 
consistent  or  (0x  >  Yd  and  0y  ^  A'j). 

5  Integrating  Classification  into  the  Framework 

As  we  saw  in  section  3  we  should  not  simply  interleave  correct  classification 
and  correct  default  reasoning  as  this  will  lead  to  a  certain  arbitrariness  in  the 
results,  and  will  give  problems  analogous  to  “shortest  path”  problems  found 
in  early  approaches  to  default  inheritance  reasoning.  We  therefore  take  the  ap¬ 
proach  of  defining  a  single  theory  which  includes  both  classificatory  reasoning 
and  default  reasoning. 

Condition  1  of  Def.  7  for  a  well-formed  preference  map  simply  states  that  a 
preference  labelling  should  not  be  inconsistent  regarding  the  preference  of  t  he 
node  for  which  it  is  a  preference  labelling.  Thi.«  is  not  affected  by  classification. 

Condition  2  of  Def.  7  captures  that  initial  information  should  be  preferred 
over  all  default  eissumptions.  This  is  also  a  criteria  which  is  clearly  applicable  to 
combined  classificatory/default  reasoning. 

Condition  4  of  Def.  7  says  that  we  prefer  assumptions  associated  with  more 
specific,  rather  than  less  specific  assumptions  and  also  seems  appropriate  to 
retain  unchanged. 

The  final  condition  of  well-formedness,  (condition  3)  has  to  do  with  ensur¬ 
ing  that  a  default  aissumption  is  empty  unless  we  independently  from  it  (and  its 
results)  believe  in  the  base  concept.  Looking  at  Figure  4,  and  starting  with  infor¬ 
mation  F ,  we  clearly  would  not  want  to  make  any  default  assumptions  regarding, 
for  example,  C  or  A". 

In  the  default  inheritance  reaisoning  the  support  in  the  ordering  of  condition 
3  of  well-formedness,  is  shown  by  a  labelling  of  1  on  a  node  in  some  “earlier” 
modified  assumption.  However  if  we  include  classification  as  a  valid  means  of 
reaching  a  conclusion  then  support  may  come  from  not  only  single  assumption(s) 
but  from  a  set  of  assumptions,  which,  taken  together  provide  the  “evidence”  for 
believing  that  type.  In  order  to  capture  this  formally,  we  define  the  notion  of 
support. 

Definition  9  A  set  of  labellings  Tl  supports  a  node  A'  iff  for  some  labelling 
L  £  Tl:  L{X)  =  1  or  for  all  Z  G  Nr:  {{Xc(Z)  =  1  implies  FI  supports  Z)  and 
{Xc{Z)  =  — 1  implies  that  for  all  L  £  FJ:  L(Z)  <  —1)). 

Note  that  the  support  required  for  default  reasoning  is  simply  a  special  case 
of  this  definition  of  support.  We  can  now  rewrite  the  third  condition  of  well- 
formedness  BS  follows: 

■F.  there  is  a  strict  partial  ordering  on  the  nodes  in  Np  such  that  VA  G  N/-.' 

tfexix)  <  1,  then  supports  A"  or  there  exists  Fi  s.t.  fl  supports  X 

and  for  all  L  £  FI  there  exists  a  node  Y  G  Np  s.t.  L  —  ))/'  and  Y  <C  A'. 


140 


The  additional  criterion  for  a  skeptical  preference  map  ensures  that  any  am¬ 
biguous  conflicts  which  remain  following  application  of  specificity  for  conflict  res¬ 
olution  will  result  in  bilateral  modification  of  the  conflicting  assumptions.  This 
appears  to  be  equally  applicable  to  combined  classification/default  reasoning  as 
it  is  to  pure  default  inheritance  reasoning.  To  illustrate  the  principle  captured 
here  we  observe  Figure  5.  In  the  left-hand  network,  the  default  assumptions  at 
E  and  B  are  both  modified  to  avoid  concluding  or  F.  ->6'  respectively. 


m/  <fh/ 

\y  \^° 

X  X 

Figure  4:  Support  Figure  5:  Ambiguous  Conflicts 


In  the  right  hand  network,  the  default  assumptions  at  C  and  B  are  modified 
leading  to  no  conclusion  regarding  D,  and  consequently  no  classification  of  F. 
However,  because  {B,C)  is  an  ambiguous  conflict,  the  classification  F  would  be 
made  in  some  credulous  extension,  and  we  therefore  allow  it  to  cause  modification 
to  the  assumption  at  £  regarding  H .  Thus  the  conclusion  for  this  figure  will  be 
A,  B,  C,  G,  E.  We  note  that  this  is  different  than  the  extension  given  by  Horty's 
method  which  will  also  include  H  and  H  and  ~‘J  are  not  in  the  intersection 
of  credulous  extensions,  and  thus  we  would  argue  that  they  should  not  be  in 
the  skeptical  extension.  This  difference  is  a  result  of  the  different  treatment  of 
ambiguous  conflict  in  the  two  methods,  where  our  approach  is  what  has  been 
referred  to  as  “ambiguity  propagating”  in  order  to  avoid  the  oddities  of  “zombie 
paths”  [7]. 

6  Discussion 

We  have  shown  how  a  minor  change  to  a  theory  for  default  inheritance  gives  a 
theory  of  combined  classification  and  default  inheritance  for  a  very  restricted  ter¬ 
minological  logi '.  Whilst  this  language  may  be  too  restricted  for  real  applications 
it  allows  us  to  obtain  a  clearer  understanding  of  the  complex  interaction  between 
default  inheritance  and  clcissification.  This  provides  a  firm  basis  on  which  we  can 
begin  to  experiment  with  the  addition  of  some  greater  expressivity.  It  may  well  be 
necessary  to  limit  the  expressivity  of  terminological  languages  with  defaults,  not 
so  much  because  of  tractability  problems  as  most  terminological  languages  are 
already  intractable  [8],  but  because  of  problems  with  “conceptual  complexity". 
However  there  are  certainly  some  applications  which  require  defaults  but  whose 
other  requirements  on  expressivity  are  limited  [15].  The  approach  described  in 
this  paper  provides  a  start  for  investigating  languages  and  a.ssofiated  reasoning 
mechanisms  for  such  applications. 

The  theory  developed  here  is  based  on  a  skeptical  inheritance  theory  with  a 
polynomial  algorithm.  Considering  the  minor  nature  of  the  change  required  to 


141 


incorporate  classification  into  this  theory,  the  algorithm  should  also  be  directly 

modifiable  to  include  classification. 

References 

1.  F.  Baader  and  B.  Hollunder.  Embedding  defaults  into  terminological  knowledge 
representation  formalisms.  In  Nebel  et  aJ.  [9],  pages  306-317. 

2.  R.  J.  Brachman.  ‘I  lied  about  the  trees’  or,  defaults  and  definitions  in  knowledge 
representation.  The  A!  Magazine,  6(3):&0-93,  1985. 

3.  J.  Doyle  and  R.  S.  Patil.  Two  theses  of  knowledge  representation:  Language  re¬ 
strictions,  taxonomic  classification,  and  the  utility  of  representation  services.  Ar¬ 
tificial  Intelligence,  48(3):26l-298,  Apt.  1991. 

4.  J.  F.  Horty  and  R.  H.  Thomason.  Boolean  extsionsins  to  inheritance  networks. 
In  Proceedings  of  the  8th  National  Conference  of  the  American  Associatton  for 
Artificial  Intelligence,  pages  633-639,  Boston,  MA,  Aug.  1990.  MIT  Press. 

5.  J.  F.  Horty,  R.  H.  Thomason,  and  D.  S.  Touretzky.  A  skeptical  theory  of  inher¬ 
itance  in  nonmonotonic  semantic  networks.  In  Proceedings  of  the  6th  National 
Conference  of  the  ,American  Association  for  Artificial  Intelligence,  pages  358-363. 
Seattle,  WA,  July  1987. 

6.  R.  MacGregor.  The  evolving  technology  of  classification-based  knowledge  repre¬ 
sentation  systems.  In  J.  F.  Sowa,  editor.  Principles  of  Semantic  Networks,  pages 
385-400.  Morgan  Kaufmann,  San  Mateo,  CA,  1991 

7.  D.  Makinson  and  K.  Schlechta.  Floating  conclusions  and  zombie  paths:  Two  deep 
difficulties  in  the  “directly  skeptical”  approach  to  defeasible  inheritance  nets.  Ar¬ 
tificial  Intelligence,  48{2)A99-2ll,  Mar.  1991. 

8.  B.  Nebel.  Terminological  reaisoning  is  inherently  intractable.  Artificial  Intelli¬ 
gence,  43:235-249,  1990. 

9.  B.  Nebel,  W.  Swartout,  and  C.  Rich,  editors.  Principles  of  Knowledge  Representa¬ 
tion  and  Reasoning:  Proceedings  of  the  3rd  International  Conference,  Cambridge, 
MA,  Oct.  1992.  Morgan  Kaufmann. 

10.  L.  Padgham.  Non-Monotonic  Inheritance  for  an  Object-Oriented  Knowledge  Base. 
PhD  thesis.  University  of  Linkoping,  Linkoping,  Sweden,  1989.  Linkoping  Studies 
in  Science  and  Technology.  Dissertations  No.  213. 

11.  L.  Padgham.  Defeasible  inheritance:  A  lattice  based  approach.  Computers  and 
Mathematics  with  Applications,  23(6-9):527-541,  1992.  Special  Issue  on  Semantic 
Nets. 

12.  P.  F.  Patel'Schneider,  B.  Owsnicki-Klewe,  A.  Kobsa,  N.  Guarino,  R.  MacGregor, 
W.  S.  Mark,  D.  McGuinness,  B.  Nebel,  A.  Schmiedel,  and  J.  Yen.  Term  subsump¬ 
tion  languages  in  knowledge  representation.  The  A I  Magazine,  lI(2):I6-23,  1990. 

13.  J.  Quantz  and  V.  Royer.  A  preference  semantics  for  defaults  in  terminological 
logics.  In  Nebel  et  al.  [9],  pages  294-305. 

14.  J.  G.  Schmolze  and  R.  J.  Brachman,  editors.  Proceedings  of  the  1981  KL-ONE 
Workshop,  Cambridge,  MA,  1982.  Bolt,  Betanek,  and  Newman  Inc.  BBN  Report 
No.  4842. 

15.  T.  Zhang  and  L.  Padgham  A  diagnosis  system  using  inheritance  in  an  inheritance 
net.  In  Proceedings  of  the  Fifth  International  Symposium  on  Methodologies  for 
Intelligent  systems,  Knoxville,  TN,  Oct.  1990.  North-Holland. 


Mechanical  Proof  Systems  for  Logic  II, 
Consensus  Programs  and  Their  Processing 
(Extended  Abstract) 


llt’lt'iia  Hasiowa'  *  aiul  V'.  Wiktor  Marok-  ’* 

'  Iiislitiilc  of  Mallicinalics,  Warsaw  I'nivcrsity,  Warsaw,  Polaml 
^  ( 'onipiilcr  Sciciicf  Ocparl iiioiit,  I' iiivorsily  of  Kontiuky.  l.oxiiigloii,  K\'  'll)'>()<)  0027 


Afrstraot.  VVc  <onliiiu<'  llic  investigations  of  [RaOO.  Ifa'Jl.  K.MMft]  and 
stnilv  the  antotiiate<l  theorem  proving  for  rea-soning  about  perception  of 
reasoning  agents  and  llieir  consensus  reaching.  I  sing  the  tcchni(|nes  of 
[UaOl]  and  of  I,ogi<  programming  ([,\p!>0.  NSO.)])  we  develop  the  pro¬ 
cessing  te<  hniiines  for  consensus  programs. 


1  Introduction 

Invest  igni  ions  coionTtiing  a  systemalic  logical  approaclt  to  rea.soning  about  knowl¬ 
edge  of  one  or  several  intr  lligetit  :tg<'nfs  Itavt'  in  r<'c<'nt  years  Ix'en  develop<  (l  by 
logicians  and  rompnter  sfi<'ntists.  '1  hese  investigations  lead  to  a  varii  ty  (if  dif- 
ftTt'iit  logical  systems  based  on  various  paradigms.  We  menticiii  liei<'  tin'  work 
of  llal|)ern  and  Moses  ([llNbSl]),  F’agin,  llaljn'm.  iiiid  \  ardi  [I'llN  9()]).  Mazf'r 
([.Mass]).  Orlowska  ([OrfK)])  just  to  itidicata'  to  I  In'  rearhr  that  the  issiii's  of 
knowledge  in  tin'  (list ribnfiva'  environment  are  slii(li('d  widi  ly.  I'.MfK  pro('e('(i- 
ings  ([llaSti.  Va,  (’a9(),  Mo92])  im  hide  nnmerons  papi'ts  (h'voti'd  to  tin'  snbji'ct. 
The  ant  hors  ([1{MS9])  proposed  an  approach  to  r('a.soning  about  the  perr('|>- 
ti(jii  and  tin'  knowh'dge  of  groups  of  fully  (•(imm(mic:il ing  agi'uts  ba.sid  on  the 
point  of  view  e.xpresst'd  in  thi'se  [irinciph's: 

1.  I'he  sharpne.s.s  of  agent  's  perci'ption  di'pends  on  agi'iit's  abiiitl(',s. 

2.  Ability's  of  varititis  agi'iils  may  be  comparable  or  tuil  rotn|)arable  (in  tin' 
.sr'iise  that  onr  ag('nt  may  be  iiuire  capable  in  one  situation,  wln'ri'as  otln'r 
ag('nt  may  In-  more  capable  in  another  situation). 

d.  Agent's  knowledgi'  alioiit  a  reality  is  only  approximate. 

■1.  Agi'iit's  knowledgi'  about  a  predicate  (property)  />  can  be  renert('(l  by  lu'r 
p('rc('plion  of  />.  that  is  the  characteristic  features  of  p  acriuired  in  a  proci'.ss 
of  collecting  information,  conducting  research,  etc. 

Agi'iit's  knowledgt'  about  a  property  p  can  be  refh'cted  in  her  abilitii's  to 
recogni/('  various  features  and  attributes  of  objects.  I’his  may  r('(|uire  acci'ss 
to  spi'cific  ri'cognition  medium  sncIi  as  a  (hilabase,  laboratory,  ti'st  (  tc. 

Work  partially  Mi[)porled  by  Polish  ( lovcriimeiil  grant  IvHN  2  20.')1  til  112.  li-mail: 
hrasiowaflmimuH .edu .pi 

Work  partially  supported  by  the  I'.S.  National  Science  I'onndation  grant  IHI- 
tini2!in2.  I, -mail:  marek6cs.uky.edu 


143 


Slarliiig  wilii  this  intuition  th»^  aiitliors  ([HMS9])  ili'vilopt'd  a  logic  willi 
two  tyja's  of  fonnt'ct iv('s;  porcoption  coniit'ctivos  anil  knovvli'ilgo  conniii ivos. 
I’rrci'pt  ion  Con  nee  t  ivos  corrospoml  to  I  ho  approxiniat  ion  loniiort  ivos  of  prodirat  o 
logic  ([Ha87,  Ha88,  Ha!)0]).  I  ho  kiiowloilgo  connoci  ivos  corrospond  to  thoso  of  |  ho 
logic  of  knowlodgo  of  Orlovvska  ([()r!)0]).  Tho  logics  wit  hoiit  know  hnlgo  oporalors, 
but  with  tho  porcoption  roniioctivos  only,  have  boon  rooxaiiiinod  by  Hasiowa 
([Ha!Jl]),  iindor  tho  tonus  of  porcoption  logics.  Hasiowa  ([Ha91])  ostablishod  an 
autoiiiati'd  thoori'iu  proving  tocluiiipio  for  such  logics.  A  similar  approach  has 
boon  pruposod  by  Fitting  ([Fi92]). 

In  this  |)a|)or  wo  look  again  at  tho  porcoption  logics  (that  is  wo  loavo  tho 
knowlodgo  part  of  tho  laiiguago  unattondod)  and  provo  sovoral  losults  rolating  tho 
ordinary  rosolutioii  mol  hod  of  Hobinson  ([Rotir)])  anil  tho  form  of  tin'  rosolulion 
discussi'd  in  [Ha91]. 

Our  aiJproach  horo  is  based  on  a  differont  tochniipio,  much  more  closely  <  on- 
noctod  to  tho  cinri'iit  piosentations  of  rosolulion.  In  this  wo  follow  tho  curroni 
texts  by  Apt  ([Ap90])  and  Nerode  and  Shore  ([NS9d]).  Tho  torhnii|uo  used  in 
[Ra91]  omployod  a  variant  of  an  argument  used  by  Orlowska  in  her  investigations 
on  tho  resolution  for  multivalued  logics. 

Tho  space  restrictions  force  us  to  eliminate  tho  proofs  of  tho  results  of  this 
(laiier.  VVV  also  hail  to  eliminate  section  dealing  with  tho  lifting  of  our  retiulls 
to  th*'  predicate  calculus  case.  Thoso  results  will  bo  includoil  in  tho  full  journal 
presentation. 


2  Perception  logics  and  resolution 

Let  T  bi'  a  finite  set  (of  reasoning  agents).  'I’lio  set  T  is  endowi'd  with  a  partial 
ordering  <7-.  Intuitively,  s  <-p  t  moans  that  the  agent  t  is  more  [lorcoptivo 
that  th(’  agent  s.  This  is  interpreted  as  follows.  When  both  the  agents  ,s  and  / 
are  aski'd  about  a  s|)ecific  fact  p,  the  agent  t  will  bo  loss  gullible.  Ib'r  abilities 
are  bettor  in  recognizing  if  tho  fact  p  really  happens.  Specifically,  f  can  find 
that  p  dill  nol,  actually,  hapjieii  (whereas  s  perceives  p  as  true).  In  particular, 
when  s  </■  t  and  p  is  an  atomic  statement  then,  whenever  t  pi'cceivos  p  to  be 
true,  .s  porceivi's  p  to  be  true.  In  other  words,  the  property  .s  <7-  f  nu'.ans  that 
this  sharper  perception  happens  for  all  po.ssible  facts  p.  Notici'.  that  a  similar 
property  is  called  '‘dominance”  in  Fitting  ([Fi9'2]).  .Ml  agents  observe  the  same 
reality.  'I'liis  moans  that  each  agent  is  endowed  with  a  valuation  of  the  set  of 
atom.s  At  (in  the  pro|)ositional  case)  or  a  relational  structure  (with  the  same 
underlying  algebra,  that  is  with  the  same  objects,  and  the  same  interpretation 
of  the  function  symbols  and  the  constants)  in  the  predicate  ca,se. 

Denot  ing  by  V)  the  valuation  assigned  to  the  agent  t  the  reipiirement  of  bet  ter 
perception  for  a  stronger  agent  is  formally  expressed  as 

W  <T  t  =>  Vpg,t,  V',(p)  <  V„,(p)  (1) 

In  the  pri'dicate  case,  denoting  by  At  the  relational  system  assigned  as  the 


144 


p<'rc<'|)l ion  of  tin'  agent  /  we  have,  for  ev»'ry  predicate  letter  p. 

'<•  <1  f  =>  C  //“  (2) 

Fonnally,  the  language  of  perception  logic  is  the  language  of  the  classical 
logic  exteinh'd  hy  unary  niotlal  operators  c/(,  for  /  £  Intuitively,  d,^  ineans 
that  tlu'  agent  t  (ainl,  as  will  turn  out  all  the  ag<'nts  with  weaker  ])erception) 
perceiv('s 

'I  he  semantics  for  our  language  is  determined  hy  the  consensus  iiiti  ri  >1 1  I  fit  loll . 
In  Ihe  preilicatf'  case  a  7-  natity  for  the  underlying  language  is  a  colh'clion  of 
first  order  relational  structures.  The  collc-clion  is  indexed  hy  the  set  /'.  All  the 
siructurf's  in  a  /’-reality  have  the  same  underlying  algebra  and  the  /'-reality 
must  satisfy  Condition  (2). 

Analogously  wv  (h'fiui-  the  notion  of  'I’-naltly  for  a  propositional  case.  It  is  a 
collection  of  valuations  of  the  underlying  set  of  atoms.  Moreover  we  re()uire  ( ) ). 

Once  it  is  clear  what  we  itK'an  hy  tin*  realities  for  our  language,  wt'  define  the 
notion  of  satisfaction,  'fhe  fact  that  a  /’-reality  M  =  {Ai),^t  s^disfies  ^  means 
that  the  cons<'nsus  about  ■f  has  been  reached.  This,  of  course  implies  that  t In¬ 
set  of  formulas  Inn-  in  a  7’-reality  is  not  complete.  I'lie  notion  of  satisfaction 
for  formulas  is  defined  in  a  roundabout  way.  Wc  give  the  full  definilion  of  sat¬ 
isfaction  for  tin-  predicate  case.  The  propositional  cast-  can  he  easily  th-scriht-d 
hy  an  oitvious  modificat  ion  of  the  clause-  (a)  and  eliminating  (|uanlifi<-r  cases  (g) 
and  (h).  First  wt-  define  tin-  relation  At  [=  (/(('^)[i’]  (where  i-  is  a  valuation  of 
variahles). 

(a)  M  . J-m))[e]  iff  A,  |=  . -fm)!'’] 

(b)  M  1=  f/d<P  V  f)[i]  iff  M  ^  t/r{0)[i']  or  M  |=  (/((t'')['’] 

(c)  M  t=  (/((<!' A  c')[r]  iff  ,V(  (=</,(<i)[e]  and  At  |=  (/((c)[i  ] 

(<1)  ^•Vf  (=  r/,(0  =>  (,  )[(■]  iff  for  all  s  <r  I,  -M  |=  d,(c!>)[v]  implies  .,V<  (=  (/,(r  )[(’] 

(e)  ./Vf  f=  f/j(-i(Z))[r]  iff  for  all  .s  <r  t.  not  |=  (/,((^)[t]  ) 

(f)  .Vf  (=  t/,,(t/,(<i))[e]  iff  .Vt  1=  f/,(0)[<’] 

(g)  N  iff  for  all  a  £  A/,  ,,V1  |=  </,(0)[!-(//o)] 

(h)  -Vf  1=  r/,(3.i  ,cj)[e]  iff  there  exists  a  £  A/,  At  1=  (/,((!>)[('(?/«)] 

Nt'xl ,  wt-  defint-  tin-  satisfact  ion  for  all  formulas  of  tin-  uiuh-rlying  language  /,. 

M  \=  i/r[e]  if  and  only  if  M  |=  (/((!/')[i-]  for  all  t  £  7’. 

A  complete  axiomat ization  for  the  relation  |=  has  heen  given  in  [IfMS!)]. 
flt-re  is  a  short  list  of  fundamental  properties  of  t  he  satisfaction  (coiisensns) 


relai.on  for  a  7’-reality  -Vt. 

,Vt  (=  -'r/((,p)[e]  iff  7wl  At  |=r/((y7)[r]  (.1) 

.Vt  1=  t/„  (--</( (y?))[i>]  iff  M  f=  ->t/,(v'>)[ij]  ( 1) 

.Vt  1=  (f/,(Yr)  =>  f/((l’))[i’]  iff  uo/,tVt  1=  </,(y,'-)[t;]  or  M  \=  d,{tt')[r]  (.'3) 

.let"’ </■  t=  implies  ^Vt  p  f/„,(y--)[r]  (()) 


145 


V«  ,(e7  i<'<T<  imp/ifs  .Vf  1=  (</,(y')  =>  </„.(y''))[i]  (7) 

Li't  us  int  roclucr  the  notion  of  D-aloin.  It  is  a  formula  of  the  form  dtpitii . . . , ,  ) 

where  I  £  ']'  and  si, . .  . ,  Sr  are  terms  of  the  language  (in  the  |>ropositioiial  ease 
no  term  is  pri'sent).  A  D-Iilrral  is  an  /^-atom  or  its  negation.  A  l)-claus(  is  a 
rlisjunetion  of  /J-literals. 

Summing  up  all  properties  of  formulas  of  the  form  d,tp.  we  quote  the  following 
result  of  [Ra9l] 

Proposition!.  For  every  formula  of  the  form  dfp  there  is  a  finite  set  of  D- 
clauses  S  such  that  dtep  and  S  are  e quisatisfiablc .  That  is  there  is  a  T-reality  .-VI 
satisfying  dfp  if  and  only  if  there  is  a  T-realily  M'  such  that  .\4'  satisfies  all 
the  D-elauses  from  S. 


Proposition  suggests  that  we  may  be  facing  a  situation  similar  to  that  en- 
couiitered  in  automated  theorem  proving,  A  version  of  resolution  prineiple  may 
work  here.  Indeed  it  is  the  case.  In  [Ra91]  Rasiowa  found  a  variant  of  the  resolu¬ 
tion  principh'  suitable  for  our  context.  We  will  refer  to  this  rule  as  '/'-n'sohilion 
rule.  We  shall  denote  this  rule  by  resf. 


A'u{d,ip(tr...li))},  A"uhd,Ap{s,....s,))} 

(A' U  A")e 


providing  ir  <r  t 


(«) 

Notice  the  a.symmetry  of  the  rule  rrsT.  Its  applicability  is  restricted  by  the 
condition  ic  <j  t.  Here  we  assume  that  the  parent  clauses  arc  standardized 
apart,  O  is  a  most  general  unifier  of  atoms  p(t\, ..  . /*•)  and  p(.s’i ...  .Sk). 

As  usual,  by  Herbrand  T-reality  we  mean  a  T-reality  whos('  undr'rlying  uni¬ 
verse  consists  of  ground  terms  of  the  language. 

Th.:  following  result  is  proved  in  [Ra91] 


Proposition  2.  Let  S  be  a  set  of  D-e lati.ses.  Then  there  is  a  T-reality  satisfying 
S  if  and  only  if  the  closure  of  S  under  T-rcsedution  does  not  contain  an  empty 
clause. 


3  Automated  theorem  proving  for  consensus  reaching 

We  will  provf'  a  basic  result  on  the  relationship  of  the  asymmetric  resolution 
rule  introduced  in  [Ra91]  and  the  usual  resolution  rule  (for  best  description  see 
Nerode  and  Shore  [NS93]). 

Let  (T,  <j)  be  a  poset,  and  let  At  be  a  set  of  atoms.  Recall  that  a  D-atom 
is  an  expression  of  the  form  d,{p)  where  p£  At,  t  £T.  Similarly,  a  D-literal  is  a 
D-  atom  or  its  negation.  Next,  a  D-clause  is  a  finite  set  of  D-  literals.  As  usual, 
□  is  the  empty  clause  (interpreted  as  falsity). 

In  our  context,  the  7’-resolulion  rule  is  the  following  rule  of  proof: 

A'Ujdtip)}.  A"  U  {^ei,Ap)) 


re  ST 


A'  U  A" 


146 


wln-rc  i<  <7  I. 

l  ilt'  oriliiiary  rosoliitioii  riilo  in  our  st'tliiig  lakes  tliis  form: 

/I' U  {tf,(/i)},  .1"  U  {-’</,{/))} 

If  s  - 

.1'  U  A" 

'1  lie  V  -rt'soliil  ion  rule  is  asymiiiel  ric.  We  can  resolve  on  a  /J-alom 
against  -'</„  (/<)  only  if  ic  </■  I.  Ilenct'.  /J-clanses  {i/i,  (/'))  ainl  {-'(/((;»)}  (ic  </  t) 
do  not  entail  □.  1  his  agrees  with  our  paradigm;  since  ic  </-  I.  tin'  atom  /> 
may  In'  perceived  hy  the  agent  a  as  true  aiul  the  agt'iil  /  ;is  false  without  the 
cont  radicl  ion. 

Li'l  .S'  he  a  set  of  /J-i  laust's.  Hy  T\  i  {S)  we  mean  the  closun'  of  the  si't  .s’  under 
tilt'  rult'  rr.s/'.  .Similarly,  7?(.S')  is  iht'  closure  of  .s'  uiidt-r  the  usual  resolution  rule. 
Nolict'  that  rrsf  is  a  generalization  of  the  ordinary  rt'solution  rule.  Iiulet'd,  every 
dt.'rivatioii  using  the  ordinary  rt'solution  is  a  valid  rrs-j-  reasoning.  This  is  because 
the  relation  <■/■  is  reflt'xive.  As  noticed  above,  the  coiivt'rst'  dot's  not  nt'ct'ssariiy 
hokl.  'I'liat  is,  a  refutation  using  7’-resolution  does  not  need  to  be  a  resolution 
refutation. 

Denott'  by  Dj-.  the  diagram  of  T,  the  following  set  of  D-claiist's: 

Ot  =  {[^dt(}>)V  du.(p)}  :  u  <rt,  p€At} 

A  more  intuitive  rt'prt'st'utation  for  Dr  is: 

Ih  =  {dt{i>)  =>  dw{p)  :  w  <T  i.p  e  A/} 

'I'lie  st't  Dr  codifies  our  kuowledgt'  about  the  relationship  between  tin'  agents.  If 
the  agent  /  is  more  perceptive  than  the  agent  w  (which  in  our  .system  is  eiicodt'd 
by  w  <r  /),  and  t  acci'pts  a  fact  ]>  then,  certainly,  ic  accepts  the  fact  p  -  but  not 
vice  versa. 

V\e  have  now  the  basic  ri'sult  on  the  connection  between  the  ordinary  reso¬ 
lution  rule  and  the  7'-r<',solut ion  rule.  'I  his  ri'sult  is  fundaiiiental  for  tin'  rest  of 
the  paper.  Once  we  prov<'  it,  we  will  be  able  to  list  the  most  of  tin'  results  of 
ordinary  automati'd  theori'in  proving  with  resolution  to  the  case  of  /7-clnuses 
and  T’-resolut  ion. 

Thoorciii  3.  Ld  S  be  a  set  of  D-claiises.  Then  □  G  7\  /'(.S)  if  and  only  (/ □  6 
TZiSuDr). 

Proposition  4.  Let  S  b(  a  set  of  D-clauses.  Then  there  is  a  T -reality  satisfying 
S  if  and  only  if  ^  ^  7v(.s'  U  Dj). 

Corollary  5  [Ra91].  Let  S  he  a  set  of  D-clauses.  Then  there  is  a  T-realtly 
satisfying  S  if  and  only  if  D  ^  7v/  (.S). 

bet  us  restrict  now  to  the  case  of  Horn  /7-clauses.  Here,  the  important  ob¬ 
servation  is  that  all  the  D-clauses  in  the  diagram  of  T’,  Dr,  are  Horn  D-clausi's. 


147 


lii'call  that  a  Morn  D-clause  is  a  /^-clause  that  contains  at  most  one  posit im- 
liti'ral.  Ill  our  setting  Horn  D-clauses  are  of  the  form; 

. *('/!■)) 

or  of  t  lie  form: 

((^i) - ~^<tuA'h)] 

Hie  D-claiise  {r/,(p), -'rfuj('li ) . ~'du  k(‘ik)}  ealletl  a  '/'-program  clause  anti 

usually  (h'liotetl  by 

(ls(p}  —  <^u,(Vi) . dui,{qk) 

riie  second  type  of  Horn  /^-clause  is  called  a  goal  and  is  denotetl  in  the  logic 
programming  hy: 

—  d„  .  .,(/„*((/*■) 

Tor  the  notion  of  linear  input  resolution  see  Nerode  and  Shore  [\S9IJ].  We  ran 
consider  the  liiu'ar  input  resolution  in  our  context.  That  is.  we  consider  a  linear 
tree  starting  with  a  goal  anti  apply  ing  '/'-resolution  itistead  of  ordinary  resolution. 

riu'  crucial  ohst'rvatioti  now  is  that  not  only  />/  consists  of  Horn  clauses, 
hut  the  transformations  described  iti  both  parts  of  out  'riu'tirem  3  preserve  the 
linear  input  resolution. 

SiK'cilically  we  have: 

Tlicort'UiG.  L(l  S  bf  a  Sfl  of  Horn  [)-(  lnusf  s.  I'Ik  ii  lh(  n  is  no  I'-nalitg  salts- 
fynig  S  if  and  only  if  S  possesses  a  linear  inpul  refutation  using  the  I'-re  solution 
rale . 

Corollary  7.  Lei  S  be  a  theory  consisting  of  Horn  D-elauses.  The  foHuicing  are 
eqiin  ale  lit: 

t.  S  possesses  a  refutation  using  'T-resolution  rule. 

d.  S  possesses  a  linear  input  rijutation  a->iny  I  -  r>  olution  rule. 

1.  S\J  l)f  possesses  a  linear  input  refutation  using  ordinary  resolution  rule, 
f.  .S’ U  1)t  possesses  a  refutation  using  ordinary  resolution  rule. 

.7.  .S  U />/■  IS  iinsaiisfiable. 

6.  There  is  no  T -reality  satisfying  .S’. 

4  Consensus  Programs  and  their  processing 

In  this  .si'ction  we  discuss  '/'-programs.  Recall  from  Section  3  that  a  '/'-program 
clause  is  a  //-clause  of  the  form 

=  d>{p)  4o(</i) . dle  A'lk) 

A  T-piograni  is  any  set  of  '/’-program  clauses.  Intuitively,  a  '/'-program  P  tie- 
scribes  a  '/-reality,  with  various  interconnections  of  agent  perceptions.  Intu¬ 
itively.  the  '/'  program  clause  ('  tells  ns  that  in  the  '/'-reality  tiescribed  by  P 


148 


tins  liii|)i>tMis;  vvln'iicvt'r  (lit-  agont  tci  |)<'rc<'iv<'s  tlie  fact  (/|.  aiui  I  lie  agent  a  -, 
perceives  (lie  fact  (/•_,  (  tc  .  then  the  agent  *'  perceives  /j. 

rile  valuations  of  ran  he  put  into  a  one-itvone  correspoiuience  with 

the  snhsi'ts  of  /J-atonis  in  the  nsnal  fashion.  Therefore  we  will  not  (list mgnisli 
hetween  the  Valuations  of  D-atonis  and  sets  of  O-atoiiis.  .\lso.  it  is  ohv  ions  that 
Y'-ri'alit ies  are  iiainrallj  ordered  hy  inchi.sioii. 

Proposition  8.  I'or  1 1  (  ry  T-piogmm  /-*  /Acre  (nsts  tlx  I'-riahty  snlisjyiug 
P 

W’e  w  ill  (lescrihe  now  a  sound  and  complete  method  of  jirocessing  queries  to 
7 -programs. 

{  lie  vvay  we  are  going  to  proce.ss  our  (pieries  is  a  variant  of  llie  usual  process¬ 
ing  of  logic  program  and  will  reflect  pri'ci.sely  the  difference  hetweini  the  usual 
resolution  ruh'  anil  the  /'-resolution  rule.  'To  set  a  leriniiiology.  we  shall  call  tins 
o)veration  '['-uiatcInDg.  Tormally,  a  /7-atom  </.,(/>)  I'-niairhcs  a  /’  program  clause 

C  =  (flip}  —  r/,. ,('/! ) . (fu  ti'ik)  if  a  <7’  (.  It  slionid  he  clear  that  when  s  =  1 

then  matching  reduces  to  selection  of  an  //-atom  (/,(p)  for  exjiansion. 

'The  si'cond  part  of  the  (irocessing  procednri'  roincidi's  with  that  of  ordi- 

tiary  logic  programming.  Once  we  have  a  goal  —  d,,,,  (i/j ) . <fv  A'ik)  and  select 

withiti  it  a  f)-atotn  if„  ,{<!,)  and  the  //-atom  </„,(</,)  '/'-matches  a  7  program 
claiise 

(Iti'U)  —  </.-,(/'() . 

then  we  create  a  new  goal 

(fu  1  (i/i ) . dll  I  (i/i  _  I ).  d;,  ( A I ) . „  ( fh; ),  d„  ( ly, I ) , .  .  (ill  ^{<n) 

this  proeess  is  called  (  ipansion.  Hence,  (he  expansion  proeivs.s  corresponds  to  one 
apjilication  of  7’-resolntion  between  the  current  goal  and  some  7'  program  clause 
in  P. 

Li't  (i  he  a  goal.  VTe  say  that  the  goal  6'  succeeds  if  there  is  a  seipience  of 

goals  fV(, . (i,„  such  t  hat  G,,  =  G',  G'„,  ^  □  and  eacli  (!,+]  arises  front  Ci,  hy 

st'lecliiig  a  7/-alom  in  (li,  matching  it  with  some  T  program  clause  in  1^  and 
expanding. 

VVe  novv  have  the  following  result. 

Proposition 9.  id  P  Ac  a  T -program.  Lcl  C  be  a  goal.  Tbto  Ibex  i.s  no  7  - 
realiiy  salisfying  /•’ U  {G)  if  and  only  if  the  goal  G'  succcrd.s. 

I.et  us  look  at  an  example. 

KrninpU  I.  Let  T  he  the  partial  ordering  of  Figure  3.  (’onsidcr  tin'  following 
simple  '/'-|>rograin 


fluip)  —  d„  ((/),d,(c) 

<l,(r)  — 

>l.A(l)  *- 


149 


t 


Fig.  1.  I'ailial  Ordering  7 


(Jivr'ii  tli('  goal  —  r/„ ;(//)  we  notice  that  llio  /^-alom  in  that  goal  V'-inaicho  tin- 
first  danse  in  onr  [irogram.  It  is  not  identical  with  it.  just  /'-inatches  it.  Ix  canse 
w  <r  a.  riiis  creates  hy  expansion  a  new  goal  *—  (7u  (7).r/.<(e).  The  lirst  D- 
atom  in  this  goal  matches  the  second  clause  in  onr  |)rogram  Ix-canse  ic  <  /.  The 
expansion  creates  now  the  goal  —  d.ifr).  The  D-alom  in  this  goal  matches  tin' 
third  danse,  I  he  result  of  ('Xpansion  is  now  □  and  -so  the  original  goal  snccei'ds. 

The  completeness  result  (f’roposition  9)  implies  llu'  following  proposition. 

Proposition  10.  Let  r/,,(p)  b(  a  D-atom.  Let  P  he  a  T-progiain.  That  Ihe  goat 
—  r/.(/')  stieeeeds  if  and  onig  if  d,,{p)  helemgs  to  the  least  I'-etalilg  salisfging  P. 

I'he  operator  ']'[>  a,ssociated  to  logic  program  ran  he  lifted  to  the  present 
situation  with  some  modifications.  Ihe  difTerenci'  is  that  as  wt-  compiiti'  new 
/Tatoms,  we  also  need  to  add  /Tatoiiis  perreiv<-d  hy  le.ss  perci'ptive  agents. 

Specifically,  given  a  /-program  P  define  an  operator  Si>  as  the  /'-closure  of 
the  set 

{r/„  (p)  ;  i  here  I'xists  ('  £  P,('  =  (l,(p)  —  (/„,('/l) - d„dr/j.).aiid 

(lu,,(q\)  e  M . (fujgt)  e  M,  n  <r  ■>}■ 

Proirositioii  11.  /.  The  eipe ralor  Si>  is  monotone  and  finitizahle . 

J.  Ill  nee.  possesses  the  least  firpoint  whieh  is  equal  to  Sy>((^). 

Similarly  to  tin'  case  of  ordinary  logic  programs  \\v  have  tin'  hrllowing  thi'orc  iii. 

Thoormii  12.  Let  P  be  a  T-program.  Then 


/.  Ih(  If  list  I’-i((ililtj  salisfyiiig  /’  foiniidfs  with  Ihf  U  ast  Ji^iKiiiit  af  ■''r 
J.  Ihf  IfasI  jijpomi  of  Si>  rontfidfs  with  tlif  sd  of  I)-<iloiiis  </,(/')  foi  uhith 
ihf  goal  —  (IAl>)  •'«'  <(fds. 

4.1  Processing  the  consensus  queries 

Now.  it  is  clear  liow  we  can  get  a  result  about  proce.ssiiig  atomic  (ju  ries  not 
involving  the  operator  (/(  for  T-progranis.  Such  query  to  a  program  P  is  a  (piery 
about  c 07».sf H.su.s  in  the  '/’-reality  ilescribetl  by  the  program  /^  I’o  get  the'  answer 
to  such  (piery,  say  p  (where  }>)  is  an  atom,  we  must  clu'ck  if  all  //-atoms  (li{p) 
succi'ed.  It  turns  (jut  that  we  do  not  need  to  check  all  atoms  (/((p).  It  follows 
immediati'ly  from  the  basic  pro[)erty  (7)  of /'-realities  (see  Section  2)  that  it  is 
enough  to  check  if  the  (pn'rh's  (/i(p)  succeed  for  all  the  waiinial  elements  I  of 
'/’.  Similarly,  if  a  <piery  (/„  (p)  fails  for  every  uiinmin/ (di'intmt  .s  of  /  then  the 
/'-ri'alily  d<>sciibed  by  our  program  P  satisfies  -ip. 

.\t  a  bigger  cost  we  ran  now  proci'ss  an  arbitrary  consensus  (pn'ry.  (liven 
a  pro[)osit  ional  formula  the  formula  y  is  satisfiinl  in  the  least  /'-reality  .Vt  / 
satisfying  the  program  P  if  and  only  if  all  formulas  (/((y)  an'  satislied  in  ,Vf/ , 
Assume  for  a  moment  that  -p  do('s  not  contain  the  implication  functor,  and 
that  th('  in'gation  functor  appi'ars  only  in  front  of  ('Xjuassioti  of  tlu'  form  (/,.(t  ) 
Clearly,  tin-  formula  </((>?)  is  logically  (-(piivah'iit  to  a  si  t  of  //-clauses.  1 1ms  wi' 
lu'i'd  to  be  able  to  clu'ck  whether  a  //-claus*'  is  satisfu'd  in  .Vdy.  Hut  such  //- 
c  lause  (■  is  satisfied  in  .Cfy  if  and  only  if  one  of  //-literals  in  C  is  satisfif'd  in 
,V(/  .  This,  togetiu'r  with  the  above  rc'iuarks  on  t('sting  tlu'  validity  of  //-ht»'rals 
in  .V(  /  ,  gives  a  met  hod  for  t<'st  ing  con.sensus  for  arhittai  y  proi>osit  ional  formulas. 

f  inally,  if  does  contain  im|>lication  and  unrestricted  negation,  then,  again, 
we  can  test  the  con.sensus  about  but  now  the  cost  is  bigger.  I  his  happens  be¬ 
cause'  in  t  h('  recursive'  (h'finit  ion  of  satisfaction  we  lu'ed  to  consult  t  hi'  percc'pl  ion 
of  less  iK'ici'ptive  agents. 

References 

[Ap!Ml]  K.  -Apt  (1990),  “bogie  f’rograniiuing'’.  In:  Uatidhttok  of  I  lit  orr  Inal  ('oinptilf  i 
Scirmr.  .1.  van  beeuven  ed.,  pp.  493  574,  Mi  l'  Press,  Cainbridge,  .M.A. 
[Fll\9()]  f’agin,  R.,  Halpern,  .1.  Y.,  Yardi,  M.  (1988),  “Model-llicoretieal  Analysis  of 
Knowledge”,  IB.M  Research  report  RJ  <>461. 

[I''i92]  f  itting.  M.  (1992),  Many-valued  Modal  bogies  11,  h'niidfimrnla  hifoitnalitai 
17,  pp.  55-73. 

[MaStj]  Halpern.  .1.  Y.  ( 1986)  (ed).  Theoretical  Aspeclx  of  Reanonmg  About  Kuowhdge. 
Morgan  Kaufinann. 

[IIM84]  Halpern,  .I.Y.,  Moses,  Y.  (1984),  “Toward  a  Theory  of  Knowledge  and 
Ignorance;  Preliminary  Report”,  PriKeidtugs  of  AAAI  Workshop  on  .\oii- 
Monoiouic  Uffisoning,  pp.  125-143. 

[Mass]  Mazer,  M.S.  (1988),  “A  Knowledge  Theoretic  Account  of  Recovery  in  Dis¬ 
tributed  Systems”,  TAHK  'S8.  M.  Vardieeb,  pp.  309-324. 


151 


[Mo‘1j]  N..  (I'JIIJ)  Jhtortlunl  .XsfHcts  oj  Ht(tst>inity  .Mnmt  l\  i  unfit  (iff  t . 

Morgan  Kaufinaiiii. 

[NS!t.(]  Ncrorlc,  A.,  Shore,  R  (IHD.I),  l.ttgic  for  Applutihons,  Spri rigcr- \ r  rlag. 

[OrX.")]  Oriowska,  K.  (  1!)85),  "Methanic  al  proof  methods  for  Post  logii  s"  ,  l.ogiijiit  Anal 
N.S.,  J8  Anne<  ,  I  10-11  1,  IT  J-IKi. 

[OrOO]  Orlowska,  I!.  (1000),  “Logic  (or  Hea.soiiing  alioiil  l^nowledgc  ‘ .  '/.  Math.  I.ogik 
(Irnnd.  Math. 

[OS8()]  Orlowska,  L..,  Sanders.  J.  (  1986).  “Knowledge  1  ransh  r  in  I  list  rihnled  S  \  stems  . 
ntipiililished  inannsc  ript . 

[t'aOO]  I’arikh.  R.  (1000)  (ed),  Thfoitlital  .Ispn/s  of  lit  (noitiiit/  Ahoiil  l\  iitorlt  litp  . 
Morgan  Kanfmann.  1090. 

[I’a8a]  Pawlak.  Z.  (108J).  “Rough  ScOs".  Itilrrniitioiinl  Joiiiiiol  of  (  ttiiipiih  t  Ami  Iti- 
foi  itialion  Six  tit  (  .ft  II,  p|>.  MI-.5.56. 

[Ra8()]  Rasiowa.  II.  (1986).  “Rough  Concepts  and  .x!*  valnul  l.ogii”.  I‘i  in  1 1  tlntijt  IS- 
M\  I.  'SO.  IKKK  Press,  pp.  28J-289. 

[Ra8T]  Rasiowa.  II.  (1087),  ".Algehraic  Approach  to  Some  Apicroximalc  Rc  asotiings” . 

I'ron  t  iliiiij.t  IS.\f\  I.  W7,  IKi.K  I’rcss,  |)p.  .MJ-il7. 

[Ra88]  Rasiowa.  II.  (1088).  "Logic- of  Appro.xiinalion  Reasoning".  Proituliiitpt  of  <  SI. 

'S7.  Springer  L.N  iti  (arnipnter  Science  -V-’O.  pp.  188-210, 

[RaOO]  Rasiowa.  II.  (1000).  "On  approxitnation  logics:  A  survey".  Knri  (  IckIcI 
Ciosellsc  haft .  lahrbtich  1990.  pp.  6.1-87. 

[RaOl]  Rasiowa.  II.  (1901).  “Methanical  Proof  Systems  for  Logic:  Reaching  (  onsen- 
sns  In  (iroups  of  Intelligent  .Agents",  lutrniiitioiml  Jovniiil  of  AfiproJ’iimtlt 
litiftouing  7},  pp.  415-'1^52. 

[RNLso]  Rasiowa,  II.,  .Marek,  W.  (1989).  "On  reaching  con.seiisus  l)\  groni>s  of  intelli¬ 
gent  agetits”.  Protudings  ISMIS'SO,  pp.  '284-24.1.  .N'orl h- Holland. 

[RSO.l]  Rasiowa,  H.,  Sikorski,  R.  (1963),  “'rite  Mathematics  of  Metamathematics". 
PU  N.  Warszawa,  (3rd  ed.  1970). 

[RcrO.a]  Rolhn.son.  (I. A.  (1965).  “A  machine  oriented  logic  based  on  the  resolution  prin¬ 
ciple",  Journal  of  Hit  Asuotmtion  for  (’ornputiiig  Macliiiii  ly  12,  pp.  23-41. 

[\’a]  Varcli.  M.  Y.  (1988),  (ed).  Theoretical  Asptet.i  of  lleaxnnug  .Mxiiil  l\  luiii  lt  tigi  . 
Morgan  Kaufman  tt. 


The  Logic  of  Only  Knowing 
as  a  Unified  Framework  for  Non-monotonic  Reasoning 


Jianhua  Chen 

Computer  Science  Department 
Louisiana  State  University 
Baton  Rouge,  LA  70803,  USA 
E-mail:  jianhua@bit.csc.lsu.edu 


Abstract 

We  propose  to  use  the  logic  of  only  knowing  (OL)  by  Levesque  [6] 
as  a  unified  framework  that  encompasses  various  non-monotonic  for¬ 
malisms  and  logic  programming.  OL  is  a  modal  logic  which  can  be 
used  to  formalize  an  agent’s  introspective  reasoning  and  to  answer  epis- 
temic  queries  to  databases.  The  OL  logic  allows  one  to  formally 
express  the  statement  "a  is  all  I  know"  (in  symbols,  Oa)  and  to  perform 
inferencing  based  on  only-knowing,  which  is  very  useful  for  common- 
sense  reasoning.  Another  nice  thing  about  the  OL  logic  is  that  it  has  a 
clear  model-theoretic  semantics  and  a  simple  proof  theory,  which  is 
sound  for  the  quantiiicational  case,  and  both  sound  and  complete  for 
the  proposition^  case. 

We  establish  the  relations  between  OL  and  various  non-monotonic 
logics  (such  as  default  logic,  circumscription)  and  logic  programming, 
thus  extending  the  existing  works  relating  the  OL  logic  with  other  non¬ 
monotonic  reasoning  formalisms  (e.g.,  Levesque  showed  [6]  that 
autoepistemic  logic  can  be  embeded  in  OL).  This  is  accomplished  by 
finding  the  connection  between  OL  and  MBNF,  the  logic  of  Minimal 
Belief  and  Negation  as  Failure  proposed  by  Lifschitz  [8,  9],  which  is 
known  to  have  close  relationship  with  logic  programming  and  other 
non-monotonic  logics.  Our  results  show  that  OL  can  be  used  as  a  uni¬ 
fied  framewoik  to  compare  different  non-monotonic  formalisms  based 
on  the  same  domain. 


1.  Introduction 

In  this  paper,  we  investigate  the  relationship  between  OL,  the  logic  of  only 
knowing,  and  default  logic,  circumscription,  and  several  logic  programming  lan¬ 
guages.  We  show  that  circumscription  and  a  substantial  class  of  default  logic  can  be 
embeded  in  OL,  and  that  normal  logic  programs  and  extended  logic  programs  (with 


153 


classical  negation)  are  also  included  in  OL.  This  is  done  by  connecting  OL  with  the 
MBNF  logic  which  is  known  to  be  a  general  framework  for  non-monotonic  reason¬ 
ing.  The  results  in  this  pap^,  coupled  with  existing  work  relating  OL  with  auto- 
epistemic  logic  and  using  OL  to  perform  epistemic  queries,  make  the  case  for  OL  as  a 
unified  framework  for  non-monotonic  reasoning. 

The  logic  of  only  knowing  is  proposed  by  Levesque  [6].  The  motivation  for  the 
OL  logic  is  to  show  that  some  patterns  of  non-monotonic  reasoning  can  be  captured 
by  using  the  classical  notions  of  logic  (satisfiability,  validity,  implication).  To  achieve 
that,  Levesque  extends  the  classical  modal  logic  by  formalizing  the  notion  of  "only 
knowing".  He  argued  that  in  modeling  an  agent’s  reasoning  about  its  own  belief  and 
knowledge,  we  need  to  formalize  not  only  the  notion  that  is  believed"  (in  symbols, 
B0),  but  also  the  notion  that  is  all  that  is  believed"  (in  symbols,  O^).  Thus  the  OL 
logic  has  two  modal  operators  "B"  and  "O".  In  [6],  a  model-theoretic  semantics  is 
defined  for  OL  and  a  simple  proof  theory  is  established.  The  proof  theory  is  essen¬ 
tially  an  extension  of  proof  theory  for  the  weak  S5  modal  logic  (also  called  K4i,^.  It 
is  both  sound  and  complete  for  the  proposititMial  case,  and  sound  for  the  quantifica- 
tional  case.  The  nice  thing  about  the  proof  theory  is  that  with  little  mwe  than  staii- 
dard  modal  logic,  we  can  perform  inferencing  in  OL  and  epistemic  query-answering 
for  non-monotonic  databases.  It  is  shown  that  the  notion  of  only  knowing  corre¬ 
sponds  exactly  to  the  notion  of  stable  expansion  in  autoepistemic  logic  (A£  logic)  by 
Moore  [15]  and  thus  autoepistemic  logic  can  be  embeded  in  OL. 

Obviously,  with  the  capability  to  express  the  notion  of  "only  knowing",  and  a 
clear  model-theoretic  semantics,  as  well  as  a  simple  proof  theory,  the  OL  logic  is  a 
very  attractive  candidate  to  be  a  general  framework  for  various  forms  of  non¬ 
monotonic  reasoning.  However,  in  spite  of  the  previous  works  on  clarifying  the  rela¬ 
tionship  between  OL  logic  and  other  non-monotonic  formalisms  [4,  6,  10,  12],  lite 
picture  is  still  not  very  clear  as  to  how  OL  relates  with  other  logics  and  logic  pro¬ 
gramming  languages.  In  this  paper,  we  attempt  to  address  this  issue  and  thus  estab¬ 
lish  the  link  between  OL  and  various  non-monoumic  logics  and  logic  programming. 
This  work  will  enhance  our  understanding  about  the  nature  of  various  non¬ 
monotonicity  captured  by  different  formalisms  and  it  will  establish  OL  logic  as  a  uni¬ 
fied  framework  for  non-monotonic  reasoning.  The  OL  logic  formulas  we  discuss  in 
this  papa  are  essentially  AE  logic  formulas,  and  thus  this  work  also  clarifies  the  rela¬ 
tionship  between  AE  and  other  non-monotonic  formalisms.  As  will  be  seen  in  Sec¬ 
tion  4,  our  approach  to  the  relationship  between  AE  logic  and  other  formalisms  (such 
as  default  logic  and  circumscription)  is  different  from  the  approaches  of  Konolige  [4], 
Marek  and  Truszczynski  [12],  and  Lifschitz  [10]. 

There  are  many  recent  works  relating  both  AE  logic  and  default  logic  to  various 
forms  of  logic  programming  [2,  3,  16,  18].  Some  recent  works  (e.g,  [18])  seem  to 
suggest  that  AE  logic  is  not  suitable  for  formalizing  logic  programs  with  classical 
negations  whose  semantics  extends  the  stable  model  semantics  of  normal  program 
and  that  default  logic  is  the  right  one  to  do  such  job.  However,  we  will  show  in  this 


154 


paper,  that  the  OL  logic  (and  hence  the  AE  logic)  does  include  the  extended  logic  pro¬ 
grams  under  a  simple  translation,  therefore  confirming  the  suitability  of  using  AE 
logic  to  formalize  extended  forms  of  logic  programming.  This  result  partially  over¬ 
laps  with  the  more  recent  work  of  Lifschitz  and  Schwarz  [11],  and  that  of  Marek  and 
Truszczynski  [13]. 

Our  approach  to  the  connections  between  OL  and  other  non-monotonic  logics 
is  to  relate  the  OL  logic  with  MBNF,  the  logic  of  minimal  belief  and  negation  as  fail¬ 
ure  by  Lifschitz  [8, 9].  Lifschitz  has  shown  that  MBNF  can  be  used  as  a  general  logi¬ 
cal  framework  in  the  sense  that  various  non-monotonic  formalisms  such  as  default 
logic,  circumscription,  as  well  as  logic  programming,  can  be  embeded  in  MBNF,  and 
that  epistemic  query  answering  can  also  be  cast  in  the  framework  of  MBNF.  In  [1], 
we  show  that  a  substantial  subclass  of  MBNF  theories  can  be  embeded  in  OL,  which 
essentially  forms  the  foundation  for  this  paper.  The  advantage  of  using  OL  (instead 
of  MBNF)  as  a  general  logical  framework  is  that  OL  has  a  simple  proof  theory  while 
MBNF  does  not  have  one. 

This  paper  is  organized  as  follows.  In  section  2,  we  will  give  basic  definitions 
and  briefly  review  OL  and  MBNF.  We  briefly  restate  the  result  from  [1  ]  regarding  the 
OL  and  MBNF  connection  in  Section  3.  The  main  results  will  be  presented  in  section 
4,  in  which  we  establish  the  relations  between  OL  and  default  logic,  circumscription, 
and  various  logic  programming  systems.  In  this  report,  we  focus  on  only  the  proposi¬ 
tional  logics  (OL,  MBNF,  default,  logic  programming,  etc.),  and  we  leave  the  quan- 
tificational  for  future  work. 

2.  Preliminaries 

We  give  the  basic  definitions  and  briefly  review  OL  and  MBNF  in  this  section. 
The  reader  is  referred  to  [6,  8,  9]  for  detailed  discussion  about  these  two  logics.  In 
both  OL  and  MBNF,  we  will  deal  with  propositional  languages  extended  by  adding 
some  modal  operators  ("B"  and  "O"  in  OL,  "B”  and  "not”  in  MBNF).  A  theory  is  a 
finite  set  of  formulas  (axioms).  A  formula  is  called  objective  if  it  does  not  include 
any  modal  operators;  it  is  called  subjective  if  each  occurrence  of  objective  formula  is 
within  the  scope  of  a  modal  operator.  Objective  theories  and  subjective  theories  arc 
defined  similarly. 

The  structures  which  define  the  truth/falsity  of  a  formula  ^  in  a  modal  language 
will  be  different  from  those  used  in  defining  the  truth  value  of  ordinary  propositional 

formulas.  Consider  a  theory  A  in  either  OL  or  MBNF  and  assume  [pi,  P2 . P* )  are 

all  the  proposition  symbols  occurring  in  A.  An  interpretation  I  is  a  set  of  atoms 
(propositions)  from  {pi,  p2, ...,  p*).  We  denote  the  set  of  all  such  interpretations  as 
Q.  Clearly  O  has  the  cardinality  2".  A  structure  is  of  the  form  (I,  S)  where  1  e  O  is 
an  interpretation  and  S  c  Q  is  a  set  of  interpretations.  Intuitively,  I  represents  the 
"real  world"  and  S  represents  the  set  of  "possible  worlds"  accessible  from  I.  Notice 
that  I  need  not  be  a  member  of  S.  Essentially,  the  truth  value  of  a  modal  fwmula  p 


155 


will  be  determined  at  each  structure  <1.  S).  Let  Tl  be  the  set  of  all  structures  and 
define  the  order  relations  "<"  and  "5"  over  11  as  follows.  Let  (li.  Si),  {1 2,  S2)  be 
structures  in  IT,  define  </i,  Si)<  {I2,  S2)  if  5i  c:  S2,  define  </i  ,Si)<  (I2,  S2)  if  Si  c 
S2-  The  logics  MBNF  and  OL  will  focus  on  the  maximal  structures  which  satisfy  a 
theory  in  such  logics. 

2.1  The  logic  of  Only  Knowing 

Levesque  developed  [6]  a  modal  logic  to  formulate  and  infer  about  an  agent’s 
knowledge  and  belief.  We  call  it  the  logic  of  only  knowing  (OL).  A  propositional  OL 
language  is  obtained  by  adding  two  modal  operators  B  and  O  to  a  propositional  lan¬ 
guage.  An  OL  theory  is  called  basic  if  it  does  not  contain  any  occurrence  of  the  O 
operator.  Given  a  structure  (I,  S)  and  formulas  ^  and  y/in  OL,  we  have  the  following 
truth  value  definitions: 

(1)  If  ^  is  an  atom,  then  ^  is  true  in  <1,  S)  if  and  only  if  ^  e  I. 

(2)  is  true  in  <1,  S)  if  and  only  if  ^  is  not  true  in  (I,  S). 

(3)  yiA  v^is  true  in  <1,  S)  if  and  only  if  both  ytand  y^are  true  in  (I,  S). 

(4)  is  true  in  <1,  S)  if  and  only  if  for  each  J  6  S,  ^  is  true  in  (J,  S>. 

(5)  Oyt  is  true  in  (I,  S>  if  and  only  if  B^  is  true  in  (I,  S>  and  for  any  J  e  O,  ^  being 
true  in  <J,  S)  will  imply  J  e  S. 

It  is  clear  that  for  a  structure  (I,  S),  the  truth  value  of  an  objective  formula  does 
not  depend  on  S  and  the  truth  value  of  a  subjective  formula  does  not  depend  on  1.  We 
say  (I,  S)  satisfies  a  formula  ^  if  ^  is  true  in  <1,  S).  A  structure  (I,  S)  is  a  model  of  a 
theory  A  if  every  formula  in  A  is  satisfied  (true)  in  (I,  S).  Later  on  we  can  see  that  the 
notion  of  "only  knowing”  (O^)  in  OL  is  very  much  similar  to  the  notirjn  of  minimal 
belief  (B(<i)  in  MBNF.  Intuitively,  to  say  <1,  S)  satisfies  means  that  <1,  S>  satisfies 
B^  and  that  S  is  a  maximal  set  of  interpretations  satisfying  BA 

2.2  Minimal  Belief  and  Negation  as  Failure 

Lifschitz  proposed  the  logic  of  minimal  belief  and  negation  as  failure  (MBNF) 
[8,  9].  Here  the  presentation  conforms  to  [9].  An  MBNF  language  is  formed  by 
adding  the  modal  operators  "B"  and  "not"  to  a  propositional  language.  For  example, 
A  =  {nor(p)  ->  q,  Br  V  B(s  v  not(q))}  is  a  thetMy  in  MBNF.  A  formula  or  a  theory  is 
called  positive  if  it  does  not  contain  any  occurrence  of  the  "not"  operator. 

The  definition  of  a  positive  formula  0  being  true  in  a  structure  (I,  S>  (i.e.,  (I,  S> 
satisfies  d)  is  essentially  the  same  as  the  above  (1)  -  (4)  in  OL  logic.  (I,  S)  is  said  to 
satisfy  a  positive  theory  A  if  every  fcHmula  in  A  is  true  in  (I,  S).  However,  (I,  S>  is 
not  necessarily  a  model  of  A,  even  if  <1,  S>  satisfies  A.  To  be  a  model  of  a  positive  the¬ 
ory  A,  (I,  S)  has  to  be  a  maximal  structure  which  satisfies  A. 


156 


To  define  the  notion  of  a  model  for  a  general  theory  A  in  MBNF.  a  triple  of  the 
form  <1,  St,  S„)  is  used,  where  1 6  Sl,St^Cl  and  S.  c  St  denotes  the  set  of  possi¬ 
ble  worlds  used  to  determine  the  belief  (formulas  of  the  form  B^)  and  5«  Fq}resents 
the  set  of  worlds  used  to  determine  the  negation  as  failure  (formulas  of  the  form 
not(^)).  To  be  more  specific,  given  a  triple  (I,  St,  5,),  and  MBNF  formulas  0  and  v'. 
the  definitions  of  ^  a  ^and  being  true  in  (I,  St,  S„)  are  parallel  to  the  (1)  - 
(4)  for  the  case  of  positive  theory,  with  (I,  St,  S„)  in  place  of  (1,  S)  and  5*  in  place  of 
S.  In  addition,  not(0)  is  true  in  (I,  St,  S„)  if  and  only  if  there  is  J  e  5,  such  that  is 
true  in  (J,  S*,  S„). 


For  a  given  theory  A  and  a  given  set  of  interpretations  S  c  define  r(A,  S)  to 
be  the  set  of  all  maximal  structures  (I,  S')  such  that  every  formula  in  A  is  true  in  (1,  S', 
S).  A  structure  (I,  S>  is  a  model  of  A  if  (1,  S>  e  r(A,  S),  i.e.,  if  (I,  S,  S)  satisfies  A,  and 
there  is  no  proper  superset  S'  of  S  such  that  (J,  S',  S)  satisfies  A  for  any  J  e  Q. 

For  an  objective  formula  0,  we  use  Mod(0)  to  denote  the  set  of  interpretations 
which  (propositionally)  satisfy  i.e.,  Mod(^)  is  the  set  of  propositional  models  of  <p. 
Consider  the  positive  theory  Ai  =  {Bp  v  Bq,  B(r  v  s)).  It  is  not  difficult  to  see  that 
the  models  of  Aj  are  of  the  form  (I,  Mod(p)  n  Mod(r  v  s)>  and  (I,  Mod(q)  n  Mod(r  v 
s)),  where  I  is  any  interpretation.  For  the  the<»y  Aj  =  {not(p)-»  q),  the  models  of  A2 
are  of  the  form  <1,  for  any  I  €  Mod(q).  The  theory  A3  =  {->  not^)}  has  no  models. 

3.  Between  OL  and  MBNF 

In  this  section,  we  restate  the  results  in  [1]  connecting  OL  with  MBNF.  Given 
an  MBNF  or  OL  theory  A  which  is  a  finite  set  of  formulas,  we  use  A  to  denote  both 
the  set  and  the  conjunction  of  the  formulas  in  the  set  Thus  we  can  talk  about  the  for¬ 
mula  (theory)  BA  =  (B^  I  ^  e  A}  in  MBNF  or  OL.  Similarly,  for  an  OL  theory  A,  we 
use  OA  to  denote  the  formula  O'?,  where  S'  is  the  conjunction  of  all  formulas  in  A. 
Note  that  0{^i ,  <Pt]  =  0[^i  a  ^2]  *  a  0^2- 

For  any  objective  formula  ip,  we  call  fcsmulas  B^,  not(^),  O^  and  their  nega¬ 
tions  belief  literals.  Here  B^  and  --B^  are  called  B-literals,  not(^)  and  -'not(^)  are 
called  not-literals,  O^and  -'0<>  are  called  O-literals.  We  call  an  MBNF  theory  A  sim¬ 
ple  if  A  is  a  finite  set  of  clauses  each  of  which  consists  of  a  disjunction  of  belief  liter¬ 
als  and  each  is  of  the  form 

C:  ->B(0)  V  not(y/)  v  B^kj  v  B^2  v  ...  v  B0„  v  ->not(v'i)  v  -'not(v'2)  “'not(v'*). 

A  is  said  to  be  X-sirrq>le  if  A  is  simple  and  fai  each  clause  C  e  A,  the  objective  formu¬ 
las  ^  and  are  conjunctions  of  propositional  literals.  A  is  said  to  be  B-simple  if  A  is 
simple  and  for  each  clause  C  c  A,  C  does  not  contain  the  belief  literal  of  the  form 
-iB^.  Note  that  simple  theories  are  all  subjective.  The  X-simple  and  B-simple  classes 
include  the  MBNF  theories  tliat  correspond  to  various  forms  of  logic  programs  dis¬ 
cussed  in  [8, 9].  the  theories  that  correspond  to  circumscription,  and  the  theories  that 


15/ 


correspond  to  a  large  class  of  default  theories,  etc.  The  following  translation  embeds 
classes  of  X-simple  and  B-simple  theories  into  OL. 

Definition  [1],  (mapping  between  MBNF  and  OL).  Define  the  mt^)ping  n, 
which  maps  simple  theories  in  MBNF  to  OL  as  follows.  Let  A  be  a  simple  theory  in 
MBNF.  ITie  msqjping  k  m24)s  a  belief  literal  in  MBNF  to  a  formula  in  OL: 

It  !  B^  — >  ^  A  B^ 

“>B^  A  B^) 

not(^)  ->  -•B^ 

-'not(^)  ->  B^ 

For  a  given  clause  C  e  C  =  Li  \/  Lt,  the  mt^ping  t  defines  the  corre¬ 

sponding  formula  Cf  in  OL,  which  is  of  the  form  C'  =  L\\/  ...  v  L\  where  each  L'^is 
obtained  from  by  applying  the  mi^ping  m.  Let  the  OL  theory  A'  =  {C':  C  g  A). 
We  call  A'  the  image  of  A  under  it. 

Note  that  in  the  above  definition  for  it,  the  belief  literal  B^  on  the  left  hand  side 
of  the  is  in  MBNF  and  the  formula  ^  a  B^  on  the  right  hand  side  is  an  OL  for¬ 
mula.  Assume  A  =  {■-'not(p)  v  Bq,  not(r)  v  iBs}.  Then  the  image  of  A  is  A'  =  {Bp  v 
(q  A  Bq),  -iBr  v  -•Bs  v  -<8) . 

Theorem  1.  [1],  Let  A  be  an  X-simple  or  B-simple  MBNF  theory  and  let  A'  be 
its  image  in  the  OL  logic.  Let  A^t  be  OA'  if  (1, 0)  are  models  of  A  otherwise  let  Ao/, 
be  OA'  A  -iBcL  Then  a  structure  <1,  S>  is  an  MBNF  model  of  A  if  and  only  if  it  is  an 
OL  model  of  Aol< 

According  to  Theorem  1,  the  classes  of  X-simple  and  B-simple  theories  can  be 
embeded  in  OL.  In  [1],  we  showed  that  the  detection  of  whether  structures  of  the 
form  <1,  0)  are  models  of  A  is  decidable,  using  the  resolution  method  in  [5].  Details 
are  omitted  hoe. 

4.  Between  OL  and  Other  Logical  Formalisms 

Once  the  relationship  between  OL  and  MBNF  is  clarified,  we  can  iHXKeed  to 
clarify  the  connection  of  OL  with  various  non-monotonic  logics  as  well  as  logic  pro¬ 
gramming,  since  MBNF  is  known  [8, 9]  to  be  closely  related  to  those  formalisms. 

4.1  Relations  to  AE  logic,  Default  Logic  and  Circumscription 

The  AE  logic  was  originally  introduced  by  Moore  [15].  An  AE  logic  theory  A 
is  nothing  but  a  basic  theory  in  OL.  (Here  we  substitute  the  modal  operator  "B"  in  OL 
for  the  operator  "L”  used  in  the  original  definition  of  AE  logic).  Such  a  theory 
intends  to  capture  an  agent’s  self-knowledge  or  beliefs.  The  notion  of  stable  expan¬ 
sion  plays  a  central  role  in  AE  logic.  Intuitively,  a  stable  expansion  T  of  an  AE 


158 


theory  A  is  a  meaningful  set  of  beliefs  which  can  be  held  by  a  rational  agent,  who  has 
A  as  initial  assumptions.  Given  an  A£  theory  A  and  an  AE  formula  A  the  inferencing 
problem  in  AE  logic  is  whether  or  not  ^  is  in  every  stable  expansion  of  A.  The  link 
between  the  belief  set  of  a  model  of  OA  and  a  stable  expansicm  of  A  (as  an  AE  logic 
theory)  was  established  by  Levesque.  Let  W  £  be  a  set  of  worlds.  The  belief  set 
for  W  is  defined  to  be  Belief(W)  =  {^  I  is  true  in  OL  logic  in  the  structure  <w,  W> 
for  any  wed).  The  following  theorem  shows  that  the  inferencing  problem  in  AE 
logic  is  straightforwardly  done  in  OL  logic  -  ^  is  in  every  stable  expansion  of  A  if  and 
only  if  OA  1-  B^. 

Theorem  [6].  Let  A  be  an  AE  logic  theory.  A  set  W  of  worlds  is  a  model  of 
OA  if  find  only  if  Belief(W)  is  a  stable  expansion  of  A. 


Default  logic  defined  by  Reito^  [17]  is  another  important  non-monotonic  logic. 
A  default  logic  theory  A  =  <F,  D>  is  a  pair  F  and  D,  where  F  is  a  propositional  theory 


and  D  is  a  set  of  defaults,  each  of  the  form 


a:  p 


The  meaning  of  the  default 


a:p 


is 


informally  the  following;  If  a  is  provable  and  P  is  consistent  with  what  is  provable, 
then  infer  y.  The  notion  of  extension  in  default  logic  plays  the  role  similar  to  that  of 
stable  expansion  in  AE  logic.  A  set  of  propositional  formulas  E  is  an  extension  of  a 
default  theory  A  if  E  is  a  fixed-point  of  some  operator  F. 


Clearly,  AE  logic  and  default  logic  are  closely  related.  Konolige  made  the  first 
effort  [4]  relating  the  two  together.  He  defines  a  translation  which  maps  the  default 
O'  S 

— —  to  the  AE  formula  Ba  a  -'B->^  -»  y.  He  showed  that  under  this  translation,  each 
7 

extension  of  the  original  default  theory  is  the  objective  part  of  a  stable  expansion  of 
the  associated  AE  theory,  but  the  converse  may  not  be  true.  Marek  and  Truszczynski 
[12]  improved  on  Konolige’s  results  and  gave  a  clear  characterization  between  AE 
logic  and  default  logic.  They  defined  weak  extensions  of  default  logic,  strong  and 
robust  expansions  of  AE  logic,  and  showed  that  weak  extensions  correspond  to  pre¬ 
cisely  stable  expansions  and  robust  expansions  correspond  to  extensions.  Here  we 
take  an  approach  different  from  both  [4]  and  [12],  We  characterize  a  subclass  of 
default  theOTies  and  define  a  slightly  different  translation  such  that  extensions  corre¬ 
spond  to  exactly  stable  expansions  for  the  subclass  of  default  theories,  under  the  new 
translation. 


Definition.  Let  A  =  <F,  D>  be  a  default  theory.  Define  the  OL  translation  of  A 

O'.  B 

to  be  the  theory  OL(A)  =  Fu  D',  where  D'=  (aA  BaA  -■B-’yS-^  y  I - g  D). 


Theorem  2.  Let  A  =  <F,  D>  be  a  default  theory  such  that  F  is  consistent  and  A 
satisfies  one  of  the  two  conditions: 


159 


(1)  Each  default  in  D  has  no  prerequisite,  i.e.,  each  default  is  of  the  form  ^ — . 

7 

oc*  S 

(2)  F  is  a  conjunction  of  literals,  and  for  each  default  — ^  €  D,  a  and  rare  con¬ 

i' 

junctions  of  literals. 

Let  OL(A)  be  the  OL  translation  of  A.  Then  any  set  of  objective  fonnulas  E  is  an 
extension  of  A  if  and  only  if  E  is  the  objective  part  of  a  consistent  stable  expansion  of 
OL(A).  Consequently,  an  objective  formula  ^  is  in  every  extension  of  A  if  and  only  if 
is  true  in  every  model  of  0[0L(A)]. 

Now  we  turn  to  the  links  between  OL  and  circumscription.  Circumscription 
defined  by  McCarthy  [14]  is  a  formalism  for  non-monotonic  reasoning  which  is  based 
on  the  notion  of  minimal  models.  Let  A(P)  be  a  (consistent)  first  order  theory  with 
predicate  symbol  P  occurring  in  A(P).  The  circumscription  of  P  in  A(P),  denoted  as 
CIRC(A(P),  P)  is  defined  to  be  the  theory  whose  models  are  precisely  the  models  of 
A(P)  in  which  the  extension  of  P  is  minimized.  Lifschitz  has  shown  [8,  9]  that  for  a 
(first  order  or  propositional)  theory  A(P),  the  circumscription  of  P  in  A(P), 
CIRC(A(P),  P),  can  be  captured  by  the  MBNF  models  of  the  theory  BA(P)  a 
(not(P(x))  -»  B->P(x)).  Correspondingly,  we  have  the  following  result: 

Theorem  3.  Let  A(P)  be  a  propositional  theory.  For  any  objective  formula  A  0 
is  true  in  all  models  of  CIRC(A(P),  P)  if  and  only  if  B<>  is  true  in  all  models  of  the  the¬ 
ory  0[A(P)  A  BA(P)  A  (--BP  ^  -.P  A  B-iP)]. 

4.2  Relation  to  Logic  Programming 

The  relationship  between  various  non-monotonic  logics  and  logic  programming 
has  been  studied  by  a  number  of  researchers  [2,  3,  16,  18].  The  AE  logic  has  been 
found  to  have  close  connection  with  the  stable  model  semantics  of  normal  programs 
[2].  It  has  been  found  that  default  logic  can  also  provide  a  natural  interpretation  of 
stable  model  semantics.  However,  for  the  extended  logic  programs  (say,  with  classi¬ 
cal  negations  as  defined  by  Gelfond  and  Lifschitz  [3]),  no  existing  approach  gives  a 
natural  interpretation  of  such  programs  as  AE  theories,  while  there  exists  interpreta¬ 
tion  of  the  extended  programs  into  default  themies.  Therefore  it  may  appear  that 
default  logic  (not  the  AE  logic),  is  the  one  that  "correctly"  extends  the  stable  seman¬ 
tics  in  logic  programming  to  handle  classical  negation.  Our  result  in  this  subsection 
shows,  however,  AE  logic  can  capture  the  semantics  of  extended  logic  programming. 
This  conclusion  concurs  with  the  most  recent  results  of  other  researchers  [11, 13]. 

For  the  sake  of  completeness,  we  also  state  in  this  subsection  the  (existing) 
result  relating  OL  logic  and  stable  model  semantics.  Recall  a  normal  logic  program  is 

a  finite  set  of  rules  of  the  form  "  a  <-  hi,  hz.  — .  h*,  not  Ci,  not  cj . not  c,.",  where 

a,  and  each  6^,  Ct  are  all  atoms  and  "not"  denotes  negation  as  failure.  Define  the  OL 
theory  I(P)to  be  the  set  of  formulas  obtained  by  converting  each  above  rule  into  the 


160 


formula  "a «-  fri  a  ^2  —  a  6*  a  ->Bci  a  ...  a  -•Be,”.  For  a  given  program  P,  we  use 

HB(P)  to  denote  the  Herforand  Base  of  P.  i.e.,  the  set  of  all  atoms  appearing  in  P. 

Theorem  4.  Let  P  be  a  normal  logic  program  and  let  I(P)  be  as  defined  above. 
Then  a  set  of  atoms  M  is  a  stable  model  of  P,  if  and  only  if  M  =  Belief(W)  n  HB(P) 
for  a  motkl  W  of  0[I(P)],  if  and  only  if  M  =  E  n  HB(P),  where  E  is  a  stable  expan¬ 
sion  of  I(P). 

Gelfond  and  Lifschitz  [3]  proposed  the  extended  logic  programs  to  include 
classical  negation  in  addition  to  negation  as  failure  in  logic  programming.  An 
extended  program  P  is  a  set  of  extended  rules  each  is  of  the  form 


r  /i  I  /2  I  ...  I  /,  <-  lr+lJr+2.  •.  Ik.  "Ot  /*+!,  nOt  1^ . "Ot 

Here  each  is  a  propositional  literal  and  "not”  denotes  negation  as  failure.  For  such 
extended  programs,  the  notion  of  an  answer-sei  is  defined  which  forms  the  semantical 
foundation  for  question-answering.  An  answa-set  of  a  program  P  is  a  set  of  proposi¬ 
tional  literals  defined  by  a  fixed-point  operation  [3].  Gelfond  and  Lifschitz  showed 
that  the  extended  disjunctive  program  can  be  embeded  in  default  logic.  They  also 
indicated  that  embedding  such  programs  into  AE  logic  seemed  to  be  not  so  straight¬ 
forward.  In  the  following,  we  give  a  simple,  syntactical  translation  which  embeds 
extended  disjunctive  logic  program  to  OL  logic  (and  hence  to  AE  logic). 

Let  P  be  disjunctive  logic  program  with  classical  negation.  Translate  the  above 
extended  rule  r  into  the  OL  formula  "/j  a  B/i  v  /2  a  B/2  v  ...  v  a  B/,.  ♦-  lr^.l  a 
B/^i  A  lr+2  A  B/^2  —  A  Z*  A  BZ*  A  “•BZ**,  A  ...  A  “'BZ,".  Define  the  OL  translation 

of  P  to  be  the  theory  OL(P)  which  consists  of  the  all  the  OL  formulas  obtained  by 
such  translation. 

Theorem  5.  Let  P  be  a  disjunctive  logic  program  with  classical  negation  and 
let  OL(P)  be  the  associated  OL  theory.  Then  all  models  of  0[0L(P)]  are  precisely 
structures  of  the  fewm  <w,  Mod(ANS)>,  where  ANS  is  an  answer-set  of  P. 

Acknowledgement 

I  am  grateful  to  Vladimir  Lifschitz  and  Grigori  Schwarz,  Wiktor  Marek  and 
Miroslaw  Truszczynski  for  sending  me  their  recent  papers. 

References 

[1]  J.  Chen,  Minimal  Knowledge  +  Negation  as  Failure  =  Only  Knowing  (some¬ 
times),  To  appear  in  Proceedings  of  2nd  International  Workshop  on  Logic  Pro¬ 
gramming  and  Nonmonotonic  Reasoning,  1993. 

[2]  M.  Gelfond  and  V.  Lifschitz,  The  Stable  Model  Semantics  for  Logic  Program¬ 
ming,  Proceedings  of  the  5th  International  Corference  on  Logic  Programming, 


161 


1988,  pp.  1070-1080. 

[3]  M.  Gelfond  and  V.  Lifschitz.  Classical  Negation  in  Logic  Programs  and  Dis¬ 
junctive  Databases,  New  Generation  Computing,  9, 1991,  pp.  365-385. 

[4]  K.  Konolige,  On  the  Relation  between  Default  and  Autoepistemic  Logic,  Artifi¬ 
cial  Intelligence,  35  (3),  1988,  pp.  343-382. 

[5]  S.  Kundu,  A  New  Logic  of  Beliefs;  Monotonic  and  Non-monotonic  Beliefs  - 
Part  I,  Proceedings  of  the  12th  International  Joint  Conference  on  Artificial 
Intelligence,  1991,  pp.  486-491. 

[6]  H.  J.  Levesque,  All  I  Know:  A  Study  in  Autoepistemic  Logic,  Artificial  Intelli¬ 
gence,  42  (1-2),  1990,  pp.  263-309. 

[7]  H.  J.  Levesque,  Foundations  of  a  Functional  Approach  to  Knowledge  Repre¬ 
sentation,  Ar/i/icio/  Intelligence,  23  (2),  1984,  pp.  155-212. 

[8]  V.  Lifschitz,  Nonmonotonic  Databases  and  Epistemic  Queries,  Proceedings  of 
the  1 2th  International  Joint  Conference  on  Artificial  Intelligence,  1991,  pp. 
381-386. 

[9]  V.  Lifschitz,  Minimal  Belief  and  Negation  As  Failure,  Submitted  for  publica¬ 
tion,  1992. 

[10]  V.  Lifschitz,  Between  Circumscription  and  Autoepistemic  Logic,  Proceedings 
of  the  1st  International  Corference  on  Knowledge  Representation  and  Reason¬ 
ing.  1989,  pp.  235-244. 

[11]  V.  Lifschitz,  G.  Schwarz,  Extended  Logic  ftograms  and  Autoepistemic  Theo¬ 
ries,  To  appear  in  Proceedings  of  2nd  International  Workshop  on  Logic  Pro¬ 
gramming  and  Nonmonotonic  Reasoning,  1993. 

[12]  W.  Marek,  M.  Truszczynski,  Relating  Autoepistemic  and  Default  Logics,  Pro¬ 
ceedings  of  the  1st  International  Conference  on  Knowledge  Representation  and 
Reasoning.  1989,  pp.  276-288. 

[13]  W.  Marek,  M.  Truszczynski,  Reflexive  Autoepistemic  Logic  and  Logic  Pro¬ 
gramming,  To  ai^iear  in  Proceedings  of  2nd  International  Workshop  on  Logic 
Programming  and  Nonmonotonic  Reasoning,  1993. 

[14]  J.  McCarthy,  Applications  of  Circumscription  to  Formalizing  Common  Sense 
Knowledge,  Artificial  Intelligence,  26  (3),  1986,  pp.  89-116. 

[15]  R.  Moore,  Semantical  Considerations  on  Nonmonotonic  Logic,  Artificial  Intel¬ 
ligence,  25  (1),  1985,  pp,  75-94. 

[16]  T.  Pizymusinski,  On  the  relationship  Between  Logic  Programming  and  Non¬ 
monotonic  Reasoning,  Proceedings  of  AAAI-88, 1988,  pp.  444-448. 

[17]  R.  Reiter,  A  Logic  for  Default  Reasoning,  Artificial  Intelligence,  13  (1-2),  1980, 
pp.  81-132. 

[18]  W.  Truszczynski,  Embedding  Default  Logic  into  Modal  Non-monotonic  Log¬ 
ics,  Proceedings  of  the  1st  International  Workshop  on  Logic  Programming  and 
Non-monotonic  Reasoning,  1991,  pp.  151-165. 


Terminological  Logic  Involving  Time  and 

Evolution: 

A  Preliminary  Report  * 


Patrick  Lambrix  and  Ralph  Ronnquist 

Department  of  Computer  and  Information  Science 
Linkdping  University 
S-581  83  Linkoping,  Sweden 
patla@ida.liu.se  ralro@ida.liu.se  +46  13  281979 


Abstract.  Although  terminological  logics  as  well  as  temporal  reasoning 
has  received  considerable  attention  in  the  knowledge  representation  com¬ 
munity  in  the  last  two  years,  few  attempts  have  been  made  to  integrate 
these  fields.  We  study  the  combination  of  the  temporal  logic  LITE  and 
a  terminological  logic  to  obtain  a  temporal  terminologicad  logic.  We  em¬ 
phasize  defining  a  terminological  logic  (T-LITE)  where  the  extensions  of 
concepts  are  time-dependent  in  the  following  sense;  first,  the  individuals 
belonging  to  a  concept  are  appearances  of  objects  in  a  temporal  context; 
secondly,  we  allow  concepts  to  be  defined  in  terms  of  developments  of 
objects.  A  formal  semantics  for  T-LITE  is  provided. 


1  Introduction 

Although  terminological  logics  as  well  as  temporal  reasoning  has  received  con¬ 
siderable  attention  in  the  knowledge  representation  community  in  the  l^t  years, 
few  attempts  have  been  made  to  integrate  those  fields.  We  study  the  combination 
of  LITE  (Logic  Involving  Time  and  Evolution)  semantics  and  a  bsise  termino¬ 
logical  logic  so  as  to  extend  terminological  logic  with  concepts  regarding  time 
and  evolution. 

Terminological  logics  or  concept  languages  are  languages  tailored  for  express¬ 
ing  knowledge  about  concepts  and  concept  hierarchies.  They  are  usually  given  a 
Tarski  style  declarative  semantics  (see  eg.  (Neb90,  App  A]),  which  allows  them 
to  be  seen  as  sub-languages  of  predicate  logic.  One  starts  with  primitive  concepts 
and  roles,  and  can  use  the  language  constructs  (such  as  intersection,  union,  role 
quantification  etc.)  to  define  new  concepts  and  roles.  Concepts  can  be  considered 
as  unary  predicates  which  are  interpreted  as  sets  of  individuals  whereeis  roles  are 
binary  predicates  which  are  interpreted  as  binary  relations  between  individuals. 
The  basic  reasoning  taisks  are  unsatisfiability  and  subsumption  checking.  A  con¬ 
cept  is  unsatisfiable  if  it  always  denotes  an  empty  set.  A  concept  C  is  subsumed 
by  a  concept  D  if  the  extension  of  C  is  always  a  subset  of  the  extension  of  D. 

*  This  work  is  supported  by  funds  from  the  Swedish  In.stitute  for  Technical  Develop¬ 
ment,  under  grant  #  8802819P 


163 


A  whole  family  of  knowledge  representation  systems  have  been  built  using  these 
languages  and  for  most  of  them  complexity  results  for  the  subsumption  algorithm 
are  known  (e  g.  KL-ONE[3],  BACK  [8],  CLASSIC  (2],  KANDOR  [9],  KRYPTON 
[4],  LOOM  [7]). 

‘LITE’  ([11], [12])  on  the  other  hand,  is  a  variation  to  first-order  predicate 
logic  where  in  particular  the  notion  of  object  is  revised  from  being  an  indivisible 
entity  into  being  a  temporal  structure  of  versions.  Each  object  version  is  then 
indivisible  and  unchanged  in  time,  while  an  object  changes  in  time  by  taking 
different  appearances. 

When  approaching  the  task  of  mixing  terminological  logic  with  time,  we 
observe  that  the  idea  of ’’temporal  concept”  requires  some  analysis.  On  the  one 
hand,  a  concept  may  be  time-dependent  in  its  extension,  i.e.  the  set  of  objects 
satisfying  its  definition  is  different  at  different  times.  This  is  the  case  usually  dealt 
with  (e.g.  [13]),  and  it  seems  to  be  in  some  way  a  ’’first”  or  more  immediate  case. 

A  concept  may  also  be  time-dependent  in  its  definition,  whereby  we  mean 
that  an  object  is  included  m  or  excluded  from  the  concept  extension  depending 
on  which  development  the  object  shows.  For  instance,  the  concept  ‘traffic-light’ 
defined  as  a  light  cycling  over  being^  green,  yellow,  and  red  is  such  a  concept. 

The  latter  kind  of  concepts  seems  to  potentially  extend  beyond  the  idea 
of  concept  languages,  because  the  extension  of  a  concept  no  longer  is  a  set  of 
single  objects  but  more  like  a  set  of  history  fragments  involving  several  objects 
in  some  certain  development  combinations.  For  instance,  the  concept  of  ‘chess- 
game’ denotes  scenarios  involving  (at  least)  two  players,  a  chess  board,  and  chess 
pieces,  where  the  latter  move  over  the  chess  board  admitting  to  a  small  set  of 
specific  rules.  To  cover  these  kinds  of  concepts  in  terms  of  terminological  logic, 
one  has  to  reify;  to  introduce  abstract  objects  that  denote  scenarios. 

Finally,  a  concept  may  also  change  in  itself,  so  that  its  extension  changes 
because  its  definition  changes.  For  instance  the  concept  of  (legally)  ‘adult’  may 
change  by  decreasing  the  required  age.  This  kind  of  development  is  probably 
even  more  difficult  to  deal  with  within  terminological  logic,  since  for  one  thing, 
it  means  that  subsumption  relationships  change. 

In  our  study  here  we  use  the  same  base  concept  language  as  [13],  which 
formally  is  a  subset  of  most  current  terminological  logics.  However  while  the  focus 
in  [13]  is  on  the  ability  to  express  extensional  concept  changes  by  using  explicit 
time  references,  we  focus  on  using  more  implicit  temporal  qualifications  for  object 
selection  in  the  sense  of  classification^.  Further  we  make  a  start  in  tackling  the 
second  kind  of  time-dependent  concepts,  i.e.  where  an  object  is  included  in  or 
excluded  from  the  concept  extension  depending  on  which  development  the  object 
shows.  In  section  2  we  describe  the  temporal  logic  LITE.  Then  we  use  the  LITE 
semantics  to  define  in  section  3  the  temporal  terminological  logic  T-LITE.  We 
conclude  the  paper  with  an  example  (section  4)  and  a  discussion  (section  5). 

^  Subject  to  national  variation. 

^  It  may  of  course  be  useful  in  practice  to  allow  clock  metrics,  but  in  our  approach, 
such  metric  is  contained  within  the  data  rather  than  extending  the  representation 
language. 


164 


2  LITE 

The  LITE  ([12])  logic  is  a  first-order  predicate  logic  with  an  extension  allowing 
objects  to  be  seen  as  sets  of  versions.  The  syntax  of  the  temporal  framework 
is  the  normal  synteix  of  first-order  predicate  logic  extended  by  the  use  of  the 
temporal  operator  In  principle,  the  normal  first  order  logic  semantics  is  also 
used,  but  over  versions,  rather  than  over  objects. 

Formally,  a  structure  is  a  tuple  (Vs,Obj,Col,Rel,Func).  Vs  is  the  set  of  distinct 
individuals  that  correspond  to  the  appearances  or  versions  of  the  objects.  We 
have  a  partial  time  order  between  versions.  Versions  belonging  to  the  same  object 
which  are  not  ordered  by  the  partial  time  order  are  said  to  be  parallel  versions. 
Obj  is  the  set  of  objects.  An  object  is  a  set  of  versions.  All  objects  are  pairwise 
disjoint.  Col  is  the  set  of  collections.  A  collection  is  a  set  composed  of  exactly  one 
version  of  each  object  such  that  all  those  versions  are  potentially  contemporary 
(see  below).  The  version  of  object  x  in  the  collection  t  is  denoted  by  x@t.  A 
collection  can  be  seen  as  a  possible  time-slice  over  the  objects.  Func  is  the  set  of 
functions.  A  function  has  its  arguments  in  Vs  or  Col  and  its  image  in  Obj.  Rel 
is  the  set  of  relations.  A  relation  has  its  arguments  in  Vs  or  Col. 

A  formula  is  interpreted  with  respect  to  a  structure  S,  a  collection  i  in  the 
structure  and  a  binding  h  which  maps  variables  to  objects  and  time  variables  to 
collections.  The  principle  is  to  interpret  formulas  in  the  classical  way  with  the 
exception  that  object  references  are  disambiguated  to  versions  by  intersecting 
the  object  with  the  collection  i. 

By  revising  the  notion  of  object  from  being  an  indivisible  entity  into  being 
a  temporal  structure  of  versions,  mixed  time  relations  are  easily  expressed  in 
LITE.  A  sentence  like  “Now,  I  like  young  Plato  but  I  don’t  like  old  Plato”  can 
be  expressed  in  LITE  as  follows. 

V<  :  ((Young(Plato@<)— ►Like(me@now,  Plato@<))  A 
(01d(Plato@<)— ►-'Like(me@now,  Plato@t))) 

We  see  that  in  this  sentence  three  different  tenses  are  at  work,  indicated  by 
‘now’,  ‘young’  and  ‘old’.  With  temporal  indexing  on  objects,  ‘now’  qualifies  ‘I’ 
while  ‘young’  and  ‘old’  are  different  qualifications  for  Plato. 

Due  to  the  importance  of  the  ‘collection’  notion,  we  give  the  formal  definition 

[12]. 

Definition!.  A  set  u  of  one  version  of  each  object  is  a  collection  iff  there  is  no 
object  X  whose  version  j:@«  in  «  is  succeeded  by  a  version  a  of  x  that  precedes 
the  version  y@u  of  another  object  y  in  «. 

The  time  ordering  of  versions  induces  an  ordering  on  the  collections  such  that 
ti  ■<  <2  for  collections  ti  and  <2  if  for  all  objects  x:  x@ti  ■<  x@t2.  We  note  that 
not  all  collections  are  related  to  each  other  in  this  way.  If  <1  <2  and  <2  ^  f  1 

<1  and  <2  arc  regarded  as  parallel  collections.  This  could  for  instance  represent 
different  possible  ways  in  which  the  world  can  have  evolved.  Let  us  assume  for 
the  example  in  figure  1  that  ti  =  {xl,yl},  <2  =  {2:2, y2}  and  <3  =  {i3,y2}. 


165 


Then  we  have  ti  ■<  ^  ^3,  and  <2  and  <3  are  parallel  collections.  An  interval 

[<iT2]  for  <1  d:  <2  is  the  set  of  collections  i  such  that  <1  d  ^  d  ^2- 


Fig.  1.  Ordering  between  collections 


In  the  rest  of  this  paper  we  sometimes  write  “an  object  at  time  t”  where  we 
actually  mean  “the  version  of  the  object  in  collection  t” . 


3  A  Temporal  Terminological  Logic 

In  our  attempt  to  combine  the  LITE  semantics  with  terminological  logic,  we 
define  a  term  valuation  function  relative  to  a  structure  so  that  the  value  of  a 
concept  is  a  set  of  pairs  (d,t)  of  objects  d  and  collections  t.  We  call  such  a  pair 
a  pointer  which  points  to  the  version  d@t. 


concept 


Concept  Terms 
primitive  —  concept 
concept'^) 

I  (all  role  concept) 

)  (atleast  min  role) 

I  (atmoat  max  role) 

I  (sometimes  concept) 
I  (always  concept) 

I  (at  concept) 

\  (when  role  concept) 

I  (does  role) 


Role  Terms 
role  primitive  —  role 
)  (and  roW^  ) 

I  (domain  concept) 
I  (range  concept) 

)  (occurs  role) 

{  (prevails  role) 
j  (seq  role**"  ) 

I  (loop  role"*" ) 


Fig.  2.  Definitions  of  concept  and  role  terms. 


A  concept  term  is  a  term  whose  value  is  a  set  of  pointers;  namely  the  pointers 
satisfying  the  particular  selection  conditions  of  the  term.  A  role  term  is  a  term 
whose  value  is  a  set  of  pointer  pairs  such  that  the  pairs  satisfy  the  role  term 
definition.  Our  language  accepts  the  concept  and  role  terms  shown  in  figure  2. 
Their  formal  semantics  are  defined  as  in  figures  3  and  4. 


166 


Standard  Terina 

D  IS  the  domain  set  of  all  pointers 
£[(and  Cl  '«)]  = 

£[(all  r  c)l  =  {{d,t)  I  Vd'.t'  : 

((d.0,(<i'.«')>er(r]-.{d',  «'>€£(£]} 
c[(atlea«t  m  r)]  =  {(d,  t)  \  il(d,  t,r)  >  m} 
where  I(d,  f,  r)  =  {d*  |  {{d,  <),  {d\  i))  € 
e[(atmost  m  r))  =  {(d,  l)  |  |/(d,  I,  r)  <  m) 
where  /(d,  t,  r)  =  {d^  |  {{d,  t),  {d\  <))  6  «{'■)} 

t[(and  ri  r.)]  =  £(r.] 

£[(doniain  c)]  =  £(c]  x  D 
e(( range  c)]  =  O  x  £[c] 


Prinuu'y  Temporal  Terms 
£((aometimes  c)]  =  {{d,  t)  |  3«'  ;  (d,  t')  £  £[c]} 

e[(always  c)]  =  {(d,  ()  |  Vt'  (d, «')  €  £[c]} 
e[(at  c))  =  {(d,  t)  I  3d'  :  (d',«)  £  e[c]} 

£[(when  r  c)]  =  {(d,  t)  |  Vd' 

((d,0,(d',«))  £  £[rH(d',<)  £  £lc]} 
£[(occura  r)]  =  {({d,  (),  (d',  1»  I  3(' 

dOt  =  dO('A<(d,«'),(d',('))  £  e[r]} 

£  [(prevails  r)l  =  {((d,  (),{d',t))  |  V(' 

dot  =  dOl'^((d,  I'),  (d',  I'))  £  £[r]) 


Fig.  3.  Standard  Terms  and  Primary  Temporal  Terms. 


Secondary  Temporal  Terms 
£[(aeq  ri  r,)]  =  {((d,  I),  (d,  l'))  I  3li  :<  i  l,+l 

I  =  I, At'  =  l.+iA(Vi  6  {1 - n)  :  {(d,  l.),{d,  1.  +  ]))  £  £[r,})) 

£[(loop  r,  r,)]  =  {((d,  I),  (d,  l'))  |  3t  >  0.  fi  ^  ^  <(t.,+  l)  ^ 

I  =  t,Al'  =  t(*...n)A(Vi  €  {l,..  .,t.o}  ((d,f,).(<i.t,  +  i))  £  £(>•,(,)])) 

where  ?(<)  =  (i  +  p  mod  o)  +  1  for  some  constant  p  £  {0, . .  . ,  n  -  1} 

£[(doe8  r)]  =  {(d,l)  I  3t',f  :  f'  :<  I  ^  f”  A{(d,  t'),  (d,  I"))  £  £(r)} 

Fig.  4.  Secondary  Temporal  Terms. 


Concepts  and  roles  may  also  be  primitive^,  in  which  case  their  values  are 
directly  retrievable  in  the  structure;  the  concepts  as  sets  of  pointers  and  the 
roles  as  sets  of  pairs  of  pointers.  Intuitively  a  primitive  concept  is  seen  as  a 
property  of  an  object  version,  and  a  primitive  role  is  seen  as  a  binary  relation 
between  object  versions. 

The  constructs  and,  all,  atleast,  and  atmost  for  concepts  and  and,  do¬ 
main,  and  range  for  roles  are  regarded  as  standard  terms  and  we  aim  for  defi¬ 
nitions  which  as  much  as  possible  follow  the  standard  definitions  for  these  terms. 
We  do  introduce  however  some  amount  of  temporal  dependency;  e.g.  the  count¬ 
ing  of  role  fillers  implied  by  atmost  and  atleast  occurs  within  collections. 

Although  these  standard  terms  do  not  seem  to  be  time  dependent  in  the 
mnemonics  of  the  operators,  they  really  are  of  course.  Consider  for  example  the 
following  term. 

(and  male  (atleast  1  (and  child  (range  female)))) 

This  term  selects  the  males  in  versions  at  the  times  at  which  they  have  atleeist 
one  daughter.  By  associating  a  concept  with  a  set  of  pointers,  we  allow  a  concept 
to  select  versions  of  objects  within  a  temporal  context. 

*  In  [8]  a  primitive  concept  is  a  concept  for  which  the  meaning  of  the  concept  can  only 
be  partially  determined  by  its  description.  It  seems  to  serve  the  same  purpose  in  the 
sense  that  aU  concepts  and  roles  are  defined  directly  or  indirectly  from  the  primitive 
concepts  and  roles. 


167 


The  constructs  sometimes,  always,  at,  and  when  for  concepts  and  occurs 
and  prevails  for  roles  are  seen  as  primary  temporal  terms.  They  are  used  for 
temporal  qualifications  based  on  inter  object  synchronization. 

The  constructs  (sometimes  c)  and  (always  c)  are  object  selectors,  i.e.  they 
select  all  pointers  for  a  particular  object  if  the  object  belongs  to  the  concept  c 
at  some  time,  or  at  all  times  respectively. 

The  (at  c)  construct  is  the  temporal®  counterpart  of  (sometimes  c).  It 
selects  a  whole  collection  whenever  some  object  in  that  collection  satisfies  c.  The 
at  term  is  for  example  most  us  ul  in  connection  with  more  explicit  collection 
references,  so  that  e.g.  Augusi-1990  would  be  a  qualification  for  some  collections. 


Fig.  5.  Peter  repaints  his  car. 


The  concept  term  (when  r  c)  is  in  principle  the  same  as  (all  r  c)  but  with 
the  addition  of  the  constraint  that  the  role  filling  pointer  is  synchronized  with 
the  collection  in  which  the  role  is  filled.  The  difference  can  easily  be  .seen  in 
the  following  example.  Consider  an  object  with  the  name  PETER.  Assume  then 
that  a  particular  version  of  PETER  owns  a  red  car  that  changes  to  green®  and 
he  owns  no  other  cars  (see  figure  5).  Then  the  following  terms 

(all  owned-car  red) 

(all  owned-car  green) 

do  not  select  this  particular  PETER  version,  whereas  the  following  terms 

(when  owned-car  red) 

(when  owned-car  green) 

select  both  this  particular  PETER  version  although  within  different  temporal 
context  ({PETER,  <i)  for  (when  owned-car  red)  and  (PETER, <2)  for  (when 
owned-car  green)). 

Whereas  sometimes  and  always  are  object  selectors  with  respect  to  con¬ 
cepts,  occurs  and  prevails  are  version  selectors  with  respect  to  roles.  They 

®  In  the  sense  that  it  selects  collections  instead  of  object. 

®  This  means  that  we  assume  that  PETER’s  appearance  does  not  change  when  he  sees 
his  car  changing  color. 


168 


Fig. 6.  Peter  repaints  and  sells  his  ,  ai. 


select  all  pointers  for  a  particular  version  as  first  argument  if  the  version  is  first 
argument  for  the  role  r  at  some  time,  or  at  all  times  respectively.  The  role  term 
occurs  extends  a  contemporary  role  filling  to  a  contemporary  role  filling  over  a 
whole  version.  The  role  term  prevails  selects  pairs  only  if  the  role  holds  for  the 
whole  version  oi  the  pointer  which  is  first  argument. 

For  the  example  in  figure  5  both  of  the  following  terms 

(prevails  owned-car) 

(occurs  owned-car) 

select  ((PETER, <i),  (CAR, <i))  and  ((PETER,t2>,(CAR,<2)). 

For  the  example  in  figure  6,  (prevails  owned-car)  does  not  select  any  pointer 
pair.  On  the  other  hand  (occurs  owned-car)  extends  the  role  to  the  whole  version 
and  selects  ((PETER,  tj),  (CAR,  <i))  as  well  as  ((PETER,  <2),  (CAR,  <2))- 

Finally  the  constructs  does  for  concepts  and  seq  and  loop  for  roles  are  in¬ 
troduced  eis  secondary  temporal  terms  allowing  for  temporal  qualification  based 
on  intra  object  synchronization,  i.e.  object  development.  These  operators  might 
also  be  called  plan  forming  operators. 

The  concept  term  does  transforms  a  succession  role  into  a  selection  of  all  the 
pointers  over  which  the  development  appears.  The  outcome  of  a  seq  term  is  the 
transition  from  the  first  to  the  last  object  versions  between  which  the  indicated 
development  appears.  An  object  changing  color  from  green  to  yellow  to  red  in 
succession  could  for  instance  be  defined  by  the  following  term: 

(seq  (domain  green)  (domain  yellow)  (domain  red)) 

Informally,  the  loop  construct  recognizes  a  recurrent  development  corresponding 
to  a  succession  of  equal  sequences. 


4  Example 

By  this  example  we  illustrate  two  specific  issues  in  using  T  LITE.  One  thing  is  the 
ability  to  recognize  and  cla.ssify  objects  by  means  of  their  development  in  time. 
The  other  thing  is  the  ability  to  recognize  and  cla.ssify  a  compound  scenario  by 


means  of  how  the  developments  of  components  are  synchronized.  Both  of  these 
are  issues  that  emphasize  the  temporal  aspects  in  a  scenario  classification. 

An  intuitively  simple  kind  of  object  recognized  and  classified  through  its  de¬ 
velopment  is  a  traffic  light.  Informally,  a  traffic  light  is  a  light  which  continuously 
changes  color  from  green  to  yellow  to  red  to  green.  This  can  be  formulated  in 
T-LITE  as  follows.  (We  use  ::=  cis  a  definition  symbol.) 

traffic-light  (and  light  (always  (does  (loop  (domain  green) 

(domain  yellow) 

(domain  red))))) 

Traffic  lights  are  posted  at  the  entries  to  an  intersection  so  that  some  few  lanes 
are  controlled  by  each  light.  The  intersection  as  a  whole  can  then  be  classified 
through  the  synchronized  behavior  of  the  lights.  We  call  an  intersection  ‘safe’  if 
the  lights  are  synchronized  so  that  crossing  lanes  never  show  green  at  the  same 
time. 

We  introduce  a  ‘conflict  point’  as  an  object  being  related  to  two  or  more 
lanes,  that  each  is  controlled  by  a  traffic  light.  The  conflict  point  is  ‘resolved’ 
when  there  is  at  most  one  lane  whose  signal  is  green  at  any  one  time.  We  obtain 
the  following  : 

resolved-point::=  (and  conflict-point 

(atmost  1  (occurs  (and  lane  (range  (when  signal  green)))))) 

The  appearance  of  the  occurs  term  forces  the  cycling  of  green  signals  to  be 
represented  in  the  development  of  the  resolved  point.  Only  one  lane  is  allowed 
to  have  a  green  signal  per  resolved  point  object  version.  Without  this  occurs 
term  resolved  points  whose  development  are  unrelated  with  respect  to  the  green 
light  signaling,  are  allowed. 

A  safe  intersection  is  then  defined  as  follows: 

safe-intersection  (and  intersection  (all  crossing  (always  resolved-point))) 

We  note  that  (always  resolved-point)  stresses  the  fact  that  every  version  of 
the  resolved  point  object  is  cl^LSsified  as  a  resolved-point.  Without  the  always 
the  intersection  might  have  been  safe  only  sometimes,  which  is  not  our  intention. 

5  Discussion 

An  important  part  of  a  terminological  logic  is  a  subsumption  algorithm  -  ideally 
sound,  complete  and  tractable,  although  this  ideal  must  usually  be  compro¬ 
mised  ([8]).  However,  similarly  to  Schmiedel  ([13]),  we  do  at  this  point  not  have 
a  subsumption  algorithm.  In  this  report,  we  have  concentrated  on  extending  the 
syntax  and  semantics  of  a  basic  terminological  logic  to  include  temporal  infor¬ 
mation  about  changing  extensions  of  an  object  and  concepts  defined  in  terms  of 
development  of  objects.  Subsumption  algorithms  for  T-LITE  can  essentially  use 


170 


the  algorithms  for  our  non-temporal  base  language  for  the  standard  terms.  This 
is  clear  as  except  for  the  extra  synchronization  in  atleast  and  atiuost,  we  have 
a  one  to  one  mapping  between  the  terms  in  the  base  language  and  the  standard 
terms  by  using  pointers  as  individuals.  For  the  temporal  terms  we  observe  that 
our  semantics  give  the  intended  subsumption  relations.  For  instance,  (always  c) 
is  subsumed  by  (sometimes  c)  and  c.  Also  (all  r  c)  is  subsumed  by  (when  r  c) 
and  (seq  rl  ...  rn)  is  subsumed  by  (loop  rl  ...  rn).  Also  similarly  to  Schmiedel 
([13]),  we  do  not  expect  complete  and  tractable  subsumption  algorithms.  We 
may  be  content  however  with  a  tractable  but  incomplete  algorithm,  or  with  a 
complete  algorithm  with  good  average  behavior.  Further,  [10]  for  instance,  shows 
that  a  subsumption  algorithm  which  is  not  complete  with  respect  to  standard 
semantics,  may  well  be  complete  with  respect  to  a  weakened  semantics. 

Schmiedel’s  work  [13]  is  based  on  Allen’s  interval  calculus  [1]  as  temporal 
framework.  Concepts  are  associated  with  functions  from  a  interval  based  time 
domain  to  sets  of  individuals.  Roles  are  associated  with  functions  from  this 
interval  based  time  domain  to  sets  of  pairs  of  individuals.  (This  means  that 
the  extension  of  a  concept  can  change  over  time.)  The  use  of  Allen’s  interval 
calculus  in  defining  Schmiedel’s  standard  semantics,  implies  that  if  we  know 
that  a  concept  or  relation  holds  over  an  interval  we  do  not  have  automatically 
any  information  regarding  whether  the  concept  or  role  holds  during  any  sub¬ 
interval.  This  problem  can  be  solved  by  allowing  to  associate  with  a  concept  or 
role  Shoham’s  hereditary  properties  [14].  In  our  approach  we  define  concepts  and 
roles  using  pointers,  i.e.  objects  at  a  specific  time  where  the  specificity  directly 
relates  to  object  change.  One  advantage  is  that  we  have  exploited  LITE’S  ability 
to  express  mixed  temporal  relations.  Therefore  roles  are  allowed  to  relate  objects 
in  different  temporal  contexts.  Further  in  our  approach  it  is  the  structure  itself 
which  tells  us  whether  we  can  conclude  information  about  sub-intervals. 

The  secondary  temporal  terms  have  some  resei’-blance  with  the  use  of  the 
plan  description  forming  operators  of  Devanbu  and  Litman  [5].  Their  system 
CLASP  is  used  to  describe  and  classify  plans.  Their  ‘plan  description  forming 
operators’  include  a  SEQUENCE  and  LOOP  operator.  For  instance 

(SEQUENCE  a  be) 

represents  a  compound  plan  consisting  of  a  sequence  of  a  plan  of  type  a,  a  plan 
of  type  b  and  a  plan  of  type  c.  Our  seq  operator  works  on  roles.  A  plan  could 
be  seen  as  a  transition  function  between  collections  (and  therefore  a  role  where 
a  pointer  in  one  collection  is  associated  with  a  pointer  in  the  other  collection 
involving  the  same  object). 

(seq  a  b  c) 

would  then  represent  all  the  instantiations  of  the  compound  plan  as  a  transition 
function. 

Recurrence  is  a  crucial  element  of  our  common-sense  notion  of  time.  As 
Koomen  [6]  expresses  : 


171 


From  our  earliest  days  we  learn  to  perceive  time  as  a  result  of  two  im¬ 
portant  cognitive  abilities  ;  the  awareness  of  change  in  the  world  around 
us,  and  the  ability  to  detect  regularities  in  that  change.  Without  change 
there  is  no  awareness  of  anything,  and  without  regularities  there  is  aware¬ 
ness  of  cheu3s  only. 

However  little  work  has  been  done  on  this  topic.  In  [6]  a  first  order  axiomatization 

based  on  Allen’s  interval  calculus  [1]  is  proposed  to  reason  about  recurrence.  By 

using  the  LITE  semantics  and  our  plan  formation  operators,  we  have  made  a 

first  step  in  introducing  reasoning  about  recurrence  in  terminological  logics. 

References 

1.  Allen,  J.F.,  ’Maintaining  Knowledge  About  Temporal  Intervals’,  in  Communications 
of  the  ACM,  Vol  26(11),  pp  832-843,  1983. 

2.  Borgida,  A.,  Brachman,  R.J.,  McGuinnes,  D.L.,  Resnick,  ’CLASSIC  :  A  Structural 
Data  Model  for  Objects’,  in  Proceedings  of  the  International  Conference  on  Man¬ 
agement  of  Data  -  SIGMOD89,  pp  59-67,  1989. 

3.  Brachman,  R.J.,  Schmolze,  J.G.,  ’An  Overview  of  the  KL-ONE  Knowledge  Repre¬ 
sentation  System’,  in  Cognitive  Science,  9(2),  pp  171-216,  1985. 

4.  Brachman,  R.J.,  Pigman  Gilbert,  V.,  Levesque,  H.J.,  ’An  Essential  Hybrid  Reason¬ 
ing  System  :  Knowledge  and  Symbol  Level  Accounts  in  KRYPTON’,  in  Proceedings 
of  the  International  Joint  Conference  on  Artificial  Intelligence  -IJCAI85,  pp  532- 
539,  1985. 

5.  Devanbu,  P.T.,  Litman,  D.J.,  ’Plan-Based  Terminological  Reasoning’,  in  Proceedings 
of  the  Conference  on  Representation  of  Knowledge  -  KR9t,  pp  128-138,  1991. 

6.  Koomen,  J.A.G.M.,  ’Reasoning  about  Recurrence’,  in  International  Journal  of  In¬ 
telligent  Systems,  Vol  6,  pp  461-496,  1991. 

7.  MacGregor,  R.,  Bates,  R.,  ’The  LOOM  Representation  Language’,  Technical  Report 
ISI/RS-87-1988,  University  of  Southern  California,  Information  Science  Institute, 
Marina  del  Rey,  CA,  1987. 

8.  Nebel,  B.,  Reasoning  and  Reiusion  in  Hybrid  Representation  Systems,  Lecture  Notes 
in  Artificial  Intelligence,  422,  1990. 

9.  Patel-Schneider,  P.,  ’Small  Can  Be  Beautiful  in  Knowledge  Representation’,  in  Pro¬ 
ceedings  of  the  IEEE  Workshop  on  Principles  of  Knowledge-  Based  Systems,  pp  11- 
16,  1984. 

10.  Patel-Schneider,  P.,  ’Adding  Number  Restrictions  to  a  Four- Valued  Terminolog¬ 
ical  Logic’,  in  Proceedings  of  the  National  Conference  on  Artificial  Intelligence  - 
AAAI88,  pp  485-490,  1988. 

11.  Ronnquist,  R.,  ’A  Logic  for  Propagation  Based  Characterisation  of  Process  Be¬ 
haviour’,  in  Proceedings  of  the  International  Symposium  on  Methodologies  for  Intel¬ 
ligent  Systems  -  ISMIS90,  pp  297-304,  1990. 

12.  Ronnquist,  R.,  Theory  and  Practice  of  Tense-bound  Object  References,  Ph.D.  the¬ 
sis,  nr  270,  Dpt.  of  Computer  Science,  Linkoping  University,  1992. 

13.  Schmiedel,  A,,  ’A  Temporal  Terminological  Logic’,  in  Proceedings  of  the  National 
Conference  on  Artificial  Intelligence  -  AAAI90,  pp  640-645,  1990. 

14.  Shoham,  Y.,  ’Temporal  Logics  in  A1  ;  Semantical  and  Ontological  Considerations’, 
in  Artificial  Intelligence,  Vol  33(1),  pp  89-104,  1987. 


KNOWLEDGE  MANAGEMENT  BY  EXAMPLE 


LEVENTV.  ORMAN 


Cornell  University,  Malott  Hall,  Ithaca,  NY  14853 
email;  orman@cmlgsm 


Abstract.  Knowledge  has  many  components  such  as  data, 
constraints,  queries,  transactions,  and  derivation  rules.  Data  is  the 
only  component  that  can  be  managed  effectively  in  large 
quantities.  All  other  components  are  in  their  infancy  in  terms  of 
tools  and  techniques  for  efficient  storage  and  retrieval, 
implementation  and  execution,  and  user  specification  and  design. 
One  approach  to  manage  alt  components  of  knowledge  in  large 
quantities  is  to  reduce  them  all  to  data.  Many  components  of 
Imowledge  can  be  expressed  in  terms  of  examples,  and  examples 
are  data.  As  such,  all  these  components  can  be  stored  and  retrieved 
efficiently  in  large  quantities,  their  execution  reduces  to  data 
comparison  and  can  be  done  in  parallel,  and  they  can  be  specified, 
designed,  and  modified  by  end  users  since  examples  are  more 
intuitive  and  easy  to  manipulate  than  general  procedures. 


Keywords:  Knowledge  engineering.  Knowledge  base  management. 
Knowledge  representation.  Database  constraints.  Integrity 
maintenance.  Rule  base. 


173 


1.  Introduction 

Knowledge  has  many  components.  Data,  constraints,  queries,  rules  and  transactions 
all  have  to  be  specified  and  designed,  stored  and  organized  for  efficient  retrieval,  and 
implemented  and  executed  when  necessary.  Data  is  the  simplest  and  most 
thoroughly  studied  component  of  knowledge.  In  large  quantities,  storage  and 
retrieval  of  all  components  of  knowledge  are  problematic  except  data.  Rule  based 
systems  for  example,  have  no  efficient  way  of  organizing  a  large  number  of  rules  on 
secondary  storage  devices,  and  many  require  all  rules  to  be  in  the  main  memory. 
Moreover,  searches  through  a  large  number  of  rules  to  find  matches  or 
inconsistencies  are  often  sequential  since  no  efficient  indexing  or  modeling 
capabilities  exist  [10, 12]. 

Data  is  the  only  component  of  knowledge,that  can  be  designed  by  end  users, 
using  a  simple  data  model  consisting  of  fiat  files.  All  other  components  need  to  be 
specified  and  designed  as  procedures  or  as  logical  statements,  both  of  which  require 
professionally  trained  personnel  such  as  analysts  or  knowledge  engineers.  Data  is 
the  only  component  of  knowledge  that  can  be  organized  and  structured  for  efficient 
retrieval  from  secondary  storage  devices.  Moreover,  all  such  structuring  has  been 
automated  by  database  management  systems,  relieving  the  users  from  the  task  of 
designing  and  optimizing  each  individual  system  separately.  Finally,  execution  of 
requests  for  data  has  been  studied  extensively,  with  a  heavy  emphasis  on 
optimization  of  such  requests,  while  such  optimization  is  in  its  infancy  for  all  other 
components  [2, 4, 13]. 


174 


Since  data  is  the  only  component  of  knowledge  that  is  well  understood  and 
successfully  studied,  representing  all  C(»nponents  of  knowledge  as  data  has 
significant  and  immediate  potential  in  alleviating  all  the  aforementioned  problems. 
Most  components  of  knowledge  can  be  represented  in  terms  of  examples;  and 
examples  are  data.  As  such  they  can  be  stored  and  structured  for  efficient  retrieval  as 
data.  Their  execution  often  reduces  to  data  comparison,  which  is  simpler  than 
procedural  execution,  and  much  simpler  than  general  inferencing  with  rules.  Lastly, 
examples  are  much  more  intuitive  for  end  users  to  understand  and  specify  correctly 
than  many  end  user  languages.  In  the  following  sections  major  components  of 
knowledge  will  be  expressed  as  example  data,  and  studied  as  data  [11, 16]. 

2.  Constraints 

Database  constraints  are  logical  statements  that  have  to  be  obeyed  by  the 
data  at  all  times  [1, 2,  8,  1 5].  We  restrict  ourselves  to  first  order  logic,  and  a  closed 
world  database  where  all  facts  not  in  the  database  are  assumed  false.  Constraints  are 
critical  in  policing  the  database  and  weeding  out  incorrect  and  inconsistent  data. 
They  are  usually  implemented  as  procedures  to  be  executed  against  the  database,  and 
after  each  modification  to  the  database.  However,  they  slow  down  the  database 
maintenance  process  considerably,  and  in  realistic  large  numbers  they  can  bring  the 
system  qierations  to  a  complete  halt.  Consequently,  their  execution  is  often  delayed 
and  done  in  batches,  leading  to  long  periods  of  ti  ue  when  the  correctness  and  the 
consistency  of  the  database  cannot  be  guaranteed.  The  inefficiency  of  database 
constraints  emanate  from  three  sources: 


175 


a.  The  inherent  difficulty  of  executing  these  complex  procedures 
against  large  databases  [IS]. 

b.  The  need  to  check  all  constraints  for  every  modification  to  the 
database,  since  there  is  no  efficient  method  to  structure  and 
organize  the  constraints  so  that  only  a  small  number  of  relevant 
constraints  could  be  identified  and  retrieved  for  testing  after  each 
database  modification.  The  attempts  to  ameliorate  this  problem 
produced  minor  improvements.  One  approach  involves  clustering 
constraints  with  respect  to  the  files  they  reference,  since  a 
transaction  involving  a  file  F  can  only  impact  the  constraints  that 
reference  F  [10].  Another  approach  involves  declarative 
specification  of  constraints,  and  theorem  proving  techniques  to 
locate  the  constraints  that  may  match  a  term  of  the  transaction. 
This  approach  requires  all  constraints  to  be  brought  into  the  main 
memory  and  tested  for  matching  terms,  since  there  is  no  effective 
storage  structure  for  constraints  to  accomplish  the  term  matching 
directly  on  the  secondary  storage  devices  [6,14]. 

c.  The  difficulty  of  maintaining  the  constraints  themselves  as  they 
change  over  time,  sir  e  their  semantics  is  buried  vHhin 
procedures,  and  often  they  can  only  be  changed  by  completely 
rewriting  them.  This  problem  is  partially  overcome  by  declarative 
specification  of  constraints;  however,  both  declarative 
specifications  and  procedures  have  to  be  stored  as  text,  and 


176 


identifying  and  retrieving  the  constraints  affected  by  a  change  in 
requirements  is  sdll  a  significant  problem  [S,  14], 

Expressing  constraints  as  example  data  attacks  all  three  problems 
simultaneously.  Examples  are  intuitive  to  end  users  and  can  be  specified  easily 
without  extensive  professional  help;  examples  can  be  stored  as  data,  structured  into 
database  files,  and  retrieved  efficiently;  and  finally  they  can  be  enforced  efficiently 
since  enforcement  reduces  to  data  comparison  between  constraints  as  example  data 
and  the  database. 

Constraints  can  be  expressed  as  example  data  that  violate  the  integrity  of  the 
database.  Given  the  following  relational  database  for  a  university  environment: 
STUDENT  (name,  dept) 

COURSE  (course#,  dept) 

REG  (name,  course#) 

containing  the  names  and  departments  of  students,  course#,  and  departments  of 
courses,  and  courses  registered  for  by  each  student.  Also  given  are  the  following 
database  constraints; 

Cl:  The  students  who  take  CS 100  also  take  CS200. 

C2:  All  CS  students  take  all  CS  courses. 

They  can  be  expressed  as  examples  violating  the  database  as  follows: 

Cl:  A  student  X  takes  CSlOO  but  not  CS200. 

REG(x.  CS  100),  -REG(x,  CS200) 

C2:  There  is  a  CS  studcn.  x  who  does  not  take  a  CS  course  y. 

STUDENT(x,  CS).  COURSE(y.  CS).  -REG(x,y) 


Cleariy,  variables  such  as  x  and  y  stand  fw  unknown  data  items  that  would  violate 
the  integrity  of  the  database  if  any  data  item  existed  to  take  their  place. 

Once  the  constraints  are  expressed  as  example  data,  they  can  be  placed  in  a 
constraint  base.  The  constraint  base  contains  the  same  files  and  the  same  stnKtiue  as 
the  database  except  for  a  constraint#  field  attached  to  each  file.  Constraint#  field 
contains  a  unique  identifier  for  each  constraint.  The  constraint  base  also  contains 
negative  files  such  as  -REG:  however,  these  negative  files  can  be  included  in  the 
positive  files  such  as  REG  by  merely  attaching  a  negative  sign  to  the  constraint#  of 
those  negative  records.  The  constrain*  base  for  the  above  constraints  is  shown  below 
where  the  constraints  field  is  separated  only  for  visual  appeal: 


constraint# 

name  dept 

constraint# 

course#  dept 

2 

X  CS 

2 

y  CS 

STUDENT 

COURSE 

constraint# 

name  course# 

1  X  CSlOO 

-1  X  CS200 


178 


In  this  constraint  base  environment,  locating  and  retrieving  the  constraints 
relevant  to  a  particular  database  transaction  is  a  simple  database  operation.  One  only 
needs  to  search  for  the  constraints  that  match  the  data  records  involved.  The  only 
assumption  needed  for  matching  is  that  the  variables  (such  as  x)  match  all  constants. 
In  a  file  with  m  attributes  and  n  records  and  assuming  a  binary  search  for  each 
attribute,  at  most  mlog2n  records  need  to  be  checked  to  find  a  match.  This  number  is 
much  smaller  than  mn/2  checks  on  average,  required  by  a  sequential  search,  and  can 
be  improved  even  further  by  utilizing  a  multiattribute  search  technique. 

Example;  A  transaction  involving  STUDENT(SMITH,  CS)  can  only 

effect  the  constraint2  since  it  matches  only  the  student  record  STUDENT(x,  CS).  A 
transaction  involving  REG(SMITH,  CSlOO)  may  effect  constraints  1  and  2,  since  it 
matches  the  REG  records  REG(x,  CSlOO)  and  REG(x,y). 

The  execution  of  constraints  involves  comparing  the  constraint  base  to  the 
database  to  find  a  match  indicating  a  violation  of  the  database  integrity.  The 
violations  can  be  detected  by  creating  a  violation  file  V  for  each  constraint  where  V 
contains  a  column  for  each  variable  in  the  constraint,  a  positive  record  for  each 
match  between  the  positive  records  of  the  consbaint  base  and  the  database;  and  a 
negative  record  for  each  match  between  the  negative  records  of  the  constraint  base 
and  the  database.  Subtracting  negative  records  from  the  positive  records  leaves  only 
the  violations  in  the  V  file.  Each  record  in  the  V  file  is  a  violation  of  the  database 


integrity. 


179 


Example:  Given  the  following  university  database: 


MM 

MM 

MM 

MM. 

SMITH 

CS 

CSlOO 

CS 

JONES 

cs 

CS200 

CS 

STUDENT  COURSE 

SMITH  CSlOO 
SMITH  CS200 
JONES  CSlOO 

REG 


Constraint  Q  is  REG(x,  CSlOO),  -REG(x.  CS200)  The  corresponding  VI  is 


X  __ 

X 

1 

SMITH 

JONES 

1 

JONES 

-1 

SMITH 

_  _ 

_  _ 

VI 


VI 


180 


since  REG(x,  CSlOO)  matches  in  the  database  REG(SMITH,  CSlOO)  and 
REG(JONES,  CSlOO),  and  •REG(x,  CS200)  matches  in  the  database  the  record 
REG(SMITH,  CS200).  Subtracting  the  negative  records  from  the  positive  lecwds, 
we  get  JONES.  Intuitively,  JONES  is  a  violation  of  Cl  since  he  takes  CSlOO  but  not 
CS200. 

Similarly,  constraint2  is  STUDENT(x,  CS),  COURSE(y,  CS),  -REG(x,  y). 
The  corresponding  V2  is: 


X 

^ _ 

X 

y 

2 

SMITH 

CSlOO 

2 

JONES 

CS200 

2 

SMITH 

CS200 

2 

JONES 

CSlOO 

2 

JONES 

CS200 

-2 

SMITH 

CSlOO 

-2 

SMITH 

CS200 

-2 

JONES 

CSlOO 

V2  V2 


since  STUDENT(x,  CS)  matches  STUDENT(SMITH,  CS)  and  STUDENT(JONES, 
CS),  and  COLrRSE(y,  CS)  matches  the  records  COURSE(CS100,  CS)  and 
COURSE(CS200,  CS),  and  -REG(x,y)  matches  all  the  records  in  REG.  Subtracting 
the  negative  records  from  the  positive  ones  we  get  JONES  CS200.  Intuitively 


181 


JONES  CS200  is  a  violation  of  C2  since  JONES  is  a  CS  student  but  fails  to  take  a 
CS  course,  namely  CS200. 

In  general,  not  all  constraints  need  to  be  checked  for  every  transaction. 
Integrity  can  be  maintained  incrementally,  by  assuming  integrity  before  a 
transaction,  and  ensuring  that  the  transaction  does  not  violate  it.  Integrity 
maintenance  then  involves  two  separate  tasks. 

a  The  identification  and  retrieval  of  relevant  constraints  for  each 
transaction. 

b.  The  execution  of  those  constraints  against  the  database  to  detect 
possible  integrity  violations.  Moreover,  the  execution  does  not 
need  to  involve  the  whole  database,  and  the  identification  and 
retrieval  of  the  relevant  portion  of  the  database  is  another  possible 
source  of  efficiency. 

Both  insertions  and  deletions  of  records  into  the  database  can  be  handled  by 
merely  matching  the  records  involved  against  the  consuaint  base,  identifying  the 
constraints  involved.  The  only  additional  assumption  is  that  the  insertions  are 
required  to  match  only  the  positive  records,  and  deletions  match  only  the  negative 
records.  Once  the  relevant  constraints  are  identified,  a  V  file  is  created  for  each  to 
detect  violations.  The  database  is  also  restricted  only  to  the  data  records  involved. 

Example:  Given  the  constraints: 

Cl:  REG(x.  CSlOO),  -REG(x,  CS200) 

C2:  STUDENi  (x.  CS),  COURSE(y,  CS).  -REG(x,y) 


182 


Insertion  into  REG  file  the  record  R£G(DOE,  CS2(X))  fails  to  match  any 
constraint  records  since  the  only  records  that  match  CS200  are  negative  ones. 
Consequently,  no  violations  can  occur.  No  further  checks  are  necessary.  Similarly, 
deleting  from  the  STUDENT  file  the  record  (SMITH,  CS)  can  cause  no  violations 
since  -STUDENT(SMITH,  CS)  fails  to  match  any  constraint  records.  On  the  other 
hand,  inserting  into  STUDENT  the  record  STUDENT(DOE,  CS)  will  cause  a  match 
with  STUDENT(x,  CS),  and  hence  the  constraint  C2  may  be  violated.  Restricting 
x=DOE,  and  forming  V2; 


2 

2 


1 _  JL 


DOE 

CSlOO 

DOE 

CS200 

V2 


which  indicates  a  violation.  Intuitively,  the  violation  follows  from  the  fact  that  all 
CS  students  have  to  take  all  CS  courses.  DOE  is  a  CS  student,  but  takes  neither  CS 
1(X)  nor  CS200.  Note  that  V2  is  a  restricted  V  file.  It  does  not  need  to  be  fully 
developed  for  a  given  transaction,  but  it  is  restricted  to  the  matches  found,  x=DOE  in 
this  case,  resulting  in  a  very  efficient  enfn’cement  algorithm. 


183 


Similarly,  deleting  the  record  REG(SM1TH,  CSlOO)  may  violate  the 
constraint  C2,  since  -REG(SM1TH,  CSlOO)  matches  -REG(x,  y).  Restricting 
x=SMITH  and  y=CS100,  and  developing  V2: 


2 


X 


SMITH 


CSlOO 


V2 


V2  does  not  contain  a  record  -2(SMITH,  CS 100)  since  it  has  been  deleted,  and  hence 
a  violation  results.  The  violation  follows  intuitively  from  the  fact  that  a  CS  student 
(SMITH)  fails  to  take  a  CS  course  (CSlOO). 

3.  Extensions 

In  the  general  case,  a  constraint  language  requires  the  power  of  first  order 
logic,  and  that  power  can  be  acquired  by  extending  the  constraint  base  with  negative 
variables  such  as  -x  meaning  "no  such  x"  [7].  These  constraints  are  less  frequent  in 
real  life  databases,  however  the  constraint  base  can  be  extended  to  handle  them  for 


theoretical  completeness. 


184 


Example:  The  following  constraints  indicate  a  violation  of  the  database 

integrity  if: 

C3:  There  is  a  CS  student  taking  no  courses. 

STUDENT(x.  CS).  REG(x.  -y) 

C4;  Every  CS  student  takes  at  least  one  course  (i.e.  let  x  be  a  student 
taking  no  CS  courses;  there  is  no  such  CS  student). 

STUDENT  (-X,  CS).  REG(x.  -y) 

Such  general  constraints  can  easily  be  stored  in  a  constraint  base.  The  only 
additional  requirement  is  the  storage  of  negative  variables.  The  constraint  base 
containing  C3  and  C4  is  shown  below: 


3 

4 


CS 

CS 


STUDENT 


3 

4 


X 

x 


-y 

-y 


REG 


185 


BIBLIOGRAPHY 


1.  Goldstein,  R.C.,  Storey  V.C.,  Unraveling  IS-A  Structures.  Information 
Systems  Research  3. 2, 1992. 

2.  Hammer,  M.,  McLeod  D.,  Database  Description  With  SDM:  A  Semantic 
Data  Model,  Transactions  on  Database  Systems,  6, 3, 3S1-386, 1981. 

3.  Hull.  R.,  Yap  C.K.,  The  Format  Model:  A  Theory  of  Database 
Organization,  Journal  of  ACM,  31, 3, 518-537,1984. 

4.  Lochovsky,  F.H.,  Tsichritzis,  D.C.,  Data  Models,  Prentice  Hall,  1982. 

5.  Morgenstem,  M.,  Constraint  Equations:  Declarative  Expression  of 
Constraints  with  Automatic  Enforcement,  VLDB  33-42,1984. 

6.  Nicholas,  J.M.,  Logic  for  Improving  Integrity  Checking  in  Relational 
Databases,  Acta  Informatica  18, 227-253,1982. 

7.  Orman,  L.V..  Functional  Development  of  Database  Applications, 
Transactions  on  Software  Engineering  14, 9,1280-1292,1988. 

8.  Orman,  L.V.,  Constraint  Maintenance  as  a  Database  Design  Criterion, 
Computer  Journal  34, 1, 73-79, 1991. 

9.  Shephard,  A.,  Kerschberg,  L.,  PRISM:  A  Knowledge  Based  System  for 
Semantic  Integrity  Specification  and  Enforcement  in  Database  Systems, 
Proceedings  of  SIGMOD  Conference  307-315, 1984. 

10.  Stonebraker,  M.,  et  al..  The  Postgres  Rules  System,  Transactions  on 
Software  Engineering  14, 7, 1988. 

11.  Tansel,  A.U.,  Arkun,  M.E.,  Ozsoyoglu  G.,  Time  By  Example  Query 
Language  for  Historical  Databases,  Transactions  on  Software  Engineering 
15,4,464-478,1989. 

12.  Tsur,  S.,  Zaniolo  C.,  A  Logic -Based  Data  Language,  VLDB,  1986. 

13.  Ullman,  J.D.,  Principles  of  Data  and  Knowledge  Base  Systems,  Computer 
Science  Press,  1989. 

14.  Urban.  S.D.,  Delcambre,  L.M.L.,  Constraint  Analysis:  Specifying 
Operations  on  Objects,  Transactions  on  Knowledge  and  Data  Engineering  2, 
4,  391-400, 1990. 

15.  Weber,  R.,  EDP  Auditing,  McGraw  Hill,  1982. 

16.  Zloof,  M.M..  Query  by  Example:  A  Database  Language,  IBM  Systems 
Journal  16, 4,1977. 


Syst('in  R('()rp,ani/ati()n  and  Load  Balancing  of 
Paralk'l  Datahast'  Rulo  Procossiiij^  * 


Hasanat  M.  Dewaii  Salvaton*  .).  Sloll'o 
Department  of  Computer  Seience 
( 'olumhia  University 
New  York.  NY  10027.  I'SA 


Ab:stract 

In  the  coming  decade,  higli-speed  network  computing  using  processors 
tliat  are  orders  of  magnitude  faster  than  the  platforms  availahle  today, 
will  enable  the  integration  and  coalescing  of  vast  amounts  of  information 
stored  in  diverse  databases.  This  will  provide  unpreredente<l  new  opportu¬ 
nities  for  acquiring  new  ktwwleflge  by  applying  various  inferential  processes 
against  such  massive  databa.ses.  Meeting  this  challenge  requires  significant 
advances  in  our  understanding  of  how  to  build  efficient,  high-performance 
knowledge-base  systems  targeted  to  run  on  a  variety  of  parallel  and  dis¬ 
tributed  hardware  architectures. 

We  address  these  concerns  in  the  context  of  the  P.AKADISER  dis¬ 
tributed  rule  processing  system.  We  present  an  approach  that  combines 
statically  computed  rtstrictions  on  rule  programs  to  p.artition  the  work¬ 
load  of  rule  evaluation  among  an  arbitrary  number  of  processing  sites, 
and  dynamic  load  balancing  protocols  that  update  ami  reorganize  the  dis¬ 
tribution  of  workload  at  runtime  Finally,  we  .analyze  the  dynamic  load 
balancing  protocols  in  terms  of  efficiency  and  scalability  criteria. 


1  Introduction 

In  the  F’ARADISER  (F’ARAf/f/aiul  DlStrihilcd  Envnoniiinit  for  Rii/cs)  project, 
we  are  developing  a  complete  database  rule  proces.sing  environment  for  expert 
database  construction  in  a  parallel  and  distributed  setting.  I’ARTLEL  (VArnllrl 
RULE  Language),  the  kernel  language  supported  by  this  environment,  has  .set- 
oriented  execution  semantics.  It  provides  language  features  that  support  nega¬ 
tion,  aggregation,  external  ftinctions,  objects,  and  flexible  control  strtirt iires  for 
manipulating  large  amonnt.sof  di.sk-resident  data.  It  com|mtes  in  ryde.s,  consist¬ 
ing  of  match,  jtrogramtnahle  conflict  7't solution,  and  fire  phases.  F’ARADISER 
|)rovidps  the  infrastructure  that  allow.s  F’ARV'LEI>  programs  to  be  comiiiled. 

*This  work  lias  been  supported  in  part  by  the  New  York  State  Srien<  e  and  rerlin*tl‘*g.y  Foun¬ 
dation  through  the  ( 'enter  for  Advanced  Technology  under  contract  NYSSTF't  'Lor207JKU .  aiul 
in  part  by  NSF  C’lSE  grant  ( '[)A-90-24735. 


187 


(.iistribiiteil,  and  evaluated  efficiently  [Stidl'o  <1  at..  1991;  Devvan  tt  nl..  199'i. 
VVolfson  i  t  al.,  1991].  Our  earlier  experiences  on  the  DADO  project  [St<jlf(i.  19S4; 
Stolfo  i  t  «/.,  1985]  •sliowed  that  under  various  schemes  for  evaluating  rules  in  par¬ 
allel,  there  is  a  large  variance  in  runtime  for  the  various  distrihuted  tasks.  This 
issue  has  not  had  much  treatment  in  the  literature  on  parallel  processing  of  rule 
programs.  This  paper  is  concerned  with  tlie  details  of  the  distrihuted  evalua¬ 
tion  schemes  in  PARADhSEH,  and  a.ssoriateil  static  and  dynamic  load  halancing 
protocols. 

The  proce.ssing  of  a  .set  of  rules  in  a  disi rihutecl  ex|)erl  database  system  may 
he  either  asynchronous  or  .synchroin>us.  In  tin'  asynchronous  model,  the  sites 
may  he  viewed  as  independent  threads  of  execution.  When  a  thread  interferes 
with  another  thread  hy  removing  facts  used  hy  it  at  an  earlier  inference  cycle, 
the  competing  threads  must  he  .synchronized  to  maintain  consistent  operation 
[Dewan  rt  al.,  199‘2;  Wolfson  it  nl..  1991].  We  h  ave  i>reviously  shown  that 
this  may  be  very  expensive  in  the  general  rase  [Ohsie  d  at..  199H].  I'o  avoid 
these  problems,  PARADISEK  synchronizes  the  sites  at  each  cycle  houmlary, 
exchanging  new  information  as  needed  Another  key  tiesign  decision  involves 
whether  to  use  a  replicateil  or  a  distributed  (fragmented)  (.latahase.  While  a 
distributed  evaluation  scheme  fhould  ideally  tise  a  ilistriluited  database,  this 
may  place  very  high  demands  on  the  utiderJying  communictit  ion  channels  due 
to  the  reorganization  of  the  distributed  database  for  load  balancing  among  the 
sites  at  runtime.  Moreover,  the  key  problems  cati  be  studied  in  either  situation. 
Thus,  for  simidicity,  the  current  implementation  u.ses  a  fully  re))licateil  database. 
W'e  have,  however,  made  substantial  progre.ss  t  oward  acrommodat  ing  dist  ributed 
databases,  details  of  which  constitute  the  .subject  matter  of  a  separate  report 
[Dewan  and  Stolfo,  199d]. 

In  RARADISER.  cupy-and-consirain  [Stolfo  et  ai.  198.5]  is  used  for  both 
compile-time  workload  distribution,  as  well  as  reorganizing  the  distributed  com¬ 
putation  at  runtime  for  balanced  operation.  Program  rrplirns  are  produceil  that 
are  constrained  to  match  smaller,  rlisjoint  sub.sets  of  the  original  database,  re¬ 
ducing  processing  effort  at  each  site.  To  alleviate  the  inefticienries  introduced  by 
synchronous  operation,  we  .seek  to  minimize  the  variance  in  runtime  among  the 
sites  at  each  cycle.  This  is  achieved  by  keeping  statistics  on  the  di.stribut ion  of 
data  in  the  base  relations.  Initially,  we  partition  th<'  workload  evenly  by  copying 
and  constraining  program  rejilicas  by  using  "restriction  predicate.s” .  or  simiily 
"restrictions”,  attached  to  the  rules.  Each  rule  in  a  program  replica  contains 
a  restriction  on  an  attribute  of  some  relation  appearing  on  its  left  hand  side. 
This  attribute  is  called  the  restriction  attribute  (RA)  for  the  rule.  The  restric¬ 
tions  limit  the  amount  of  data  inspected  by  each  rule,  and  hence  by  t  he  entiri' 
program  replica.  The  Isoweight  C’opy-and-Constrain  {!(’(')  algorithm,  outlined 
in  Section  4.1,  makes  use  of  compile-time  statistics  to  derive  initial  restrictions 
for  the  rules.  At  runtime,  adjustments  are  maile  to  the  re,strictions  ,so  that 
they  exhibit  balanced  operation  on  subsequent  cycles  when  the  initial  proce.ss- 
ing  allocation  fails  to  attain  a  balanced  load.  This  is  achieved  via  the  Dyunmir 

'The  update.s  are  performed  using  a  lazy  proturol,  making  tliis  appareiil  liarrier  synchro¬ 
nization  point  non-strict. 


188 


Rt'stncltou  Adjusliiii  iii  (DHA)  aigorit tiiii,  wliicli  is  oiitliiit'il  in  Scciiini  12 

riip  DHA  algorithm  is  aiipliod  as  part  of  a  dyiiamir  |oa<l  halanring  (1)1, H) 
protocol  for  detecting  and  correcting  an  nnlialaticed  system  The  particular 
DhB  |)rotorol  for  detecting  imhalance  anil  the  1)|{A  algorithm  for  correcting  it 
constitute  overhead  not  paid  in  t  he  seipient iai  case.  I'he  former  of  i  hesi‘  sources 
deservi's  attention  in  that  as  the  nuiidier  of  sites  is  scaleil  up,  the  crjuimnidcal  ion 
osts  for  determining  imhalance  of  the  eiisemhle  can  display  fast  growth,  and 
may  dominate  the  overhead  costs.  1  hits,  the  choice  of  DhH  protocol  is  cruc  ial 
for  minimizing  this  overhead. 

V\’p  coiiijiare  varinns  DLB  protocols  hy  analyzing  their  "isoelliciency  fnnio 
tions.  Isoefficiency  is  a  formalism  proposed  in  [Kumar  and  Hao.  Iht'?],  and 
further  studied  in  [Kumar  et  at..  199l],  that  provides  a  measure  of  scalability  of 
an  algorithm-architecture  combination.  I'lie  isoefficiency  analysis  reported  here 
is  guiding  our  design  derisiotis  in  the  f’AKADISEH  environment. 

The  rest  of  this  jiaper  is  organized  as  follows.  Section  2  contains  basic  delini- 
tions.  Sectioti  3  discusses  the  ropy-and-coiistrain  ti-chniipie  and  jiri'sents  heuris¬ 
tics  for  choosing  restriction  attributes.  Iti  Section  -1,  we  outline'  tin*  !('('  and 
DRA  algorithms  for  database  partitioning  and  provide  an  example  of  their  ap- 
jdicatiott.  Section  .j  introduces  a  coilectioii  of  dynaiitic  load  btilancing  protocols 
A  framework  *br  analyzing  these  protocols  is  given  in  Section  (>.  and  analysis  of 
the  protocols  is  given  in  Section  7. 

2  Basic  Definitions  and  Terminology 

Rule-based  expert  systems  typically  follow  the  I’rodurttmi  Siislt  in  (I'S)  modc'l, 
consisting  of  a  set  of  rules  and  a  database  of  facts  (working  memory  or  W'M). 
A  rule  consists  of  a  left  hand  side  (LRS).  which  is  a  conjunction  of  condition 
elements  or  "patterns"’  that  may  match  eh'ments  of  the  W.M.  The  right  hand 
side  (RRS)  of  a  rule  consists  of  actions  that  modify  the  database  I'he  actions 
may  be  performed  if  the  LffS  is  satisfied,  thus  producing  a  rule  nistanti.  I’he 
operational  semantics  of  conventional  rule  languages  such  as  Ol’So  calls  for 
the  selection  of  a  single  instance  from  the  set  of  instances  by  apjilying  some 
selection  strategy,  also  known  as  the  rouflict  n.soliilioii  method,  and  executing 
the  actions  associated  with  that  instance.  This  procedure  is  carried  out  in  a 
cycle  consisting  of  the  well-knowui  vialch-s>  Icrl-act  jihases,  until  a  predefined 
termination  condition  (usually  a  fixpoint)  is  reached.  In  the  relational  model. 
WM  can  be  viewed  as  a  relational  databa.se.  and  both  the  LHS  and  RlfS  of  rules 
tiiay  be  expressed  as  database  ipieries.  To  the  extent  that  the  ri'lational  model 
can  be  mapped  onto  databases  formed  from  collections  of  persistent  objects,  the 
same  ideas  ap|'ly  to  object-oriented  datal>a.ses  as  well.  The  ITfS  translates  into 
joins  and  selections  over  the  corresponding  relations  of  the  database  that  map 
the  working  memory  in  (|uestion  [ffanson  and  Widom,  1992]. 

In  rule-based  systems,  the  stored  farts  define  the  I'xtensional  database 
(EDB).  while  the  rules  define  the  "derived”  or  intensional  database  (lf)B).  I'lie 
IDR  may  be  regarded  as  a  view  on  the  databa.se,  and  this  view  can  be  either 
comimted  dynamically  as  needed,  or  stored  in  some  fashion.  Tlie  rules  may  refer 


189 


to  any  roiiibiiiat ioii  of  EDM  or  IDH  relations.  In  onr  work,  ■■knowli  (l);i', liasr  sys- 
tPiiis"  refer  loosely  to  systems  consisting;  of  a  rule-based  |)rog,raMi  logeiher  uiiii 
an  iinderlyintf  database  of  facts  about  il.  diamain.  I  lie  rules  encoile  domain- 
knowledgie,  and  a  vvell-delined  oiierational  semantics  t  iiables  inference  of  new 
knowledge  (tlie  IDH)  from  the  existing  body  of  facts  (tlie  KDH). 

3  Load  Distribution  by  Copy-aiid-Constrani 

In  HAKADISEH,  workload  is  distributed  by  dividing  the  tdfort  involved  in  rule 
tnal riling  among  tlie  proce.ssing  sites  tliroiigli  tlie  n.si  of  the  copy-and-const ram 
paradigm  [Stolfo  tl  ai.  HliSa;  W'olfson  and  Ozeri.  l!)!(t)].  In  practice,  rules  are 
replicated  with  additional  constraints  attached  to  each  ropy.  Such  restricted 
rules  ran  be  matched  in  ptirallel,  thus  providing  a  speedup  [Stolfo  i/  n/..  HIS'): 
Hasik,  1987],  To  illu.strate,  let  rule  ;  bt . 

r  :  f  'i  A  Cs  A  •  ■  A  (  —  .1 1 .  •  ■  • ,  .-l„,  n,  »i  >  1 

where  the  =  1  •  -n  are  conditions  .'iilulniy  a  suliset  of  \VM.  and  Aj.j  — 
1  ■  ■  ■  ))i  is  a  .set  of  in  actions  (njxlati's  to  the  database).  L<'t  H(('i)  di-note  the 
tuples  selected  by  ('{.  The  work,  IV  (f)-  *'-)  process  r  is  Ixuinded  by  tlu'  size  of 
the  cro.s.s  product  of  the  tuples  selected  on  tin-  LMS  of  r-: 

<  I  /t'(V'i)  I  X  I  R{(S)  I  X  ■•■x  i  /f(f„)  I 

•Suiipose  we  choose  to  copy-;ind-con.strain  rule  r  on  condition  f to  produce  k’ 
new  conditions  {f f  f and  A-  lunv  replicas  of  r,  {r' .  i-’.  •  ■  ,  },  where 

each  replica  has  ('i  replaced  by  ( 'j .  /  =  !  -  ••A.  on  the  EH.S.  If  the  new  condi¬ 
tions  are  chosen  such  that  |  |  w  =  1  A.  n*=i  /'’(V'l)  = 

and  Ui=i  then 

lV(r‘)  «  VV(f'')  <=«  1  R((\)  1  X  x(Leip^)x  ■■•X  1  /f(C„)  |« 

For  appropriately  chosen  conditions,  each  of  the  A  replicas  reipiire  j:f/i  the 
amount  of  work  as  the  original  rule  r  to  process,  forming  the  same  set  of  in¬ 
stantiations.  If  the  re|iliras  are  processed  in  parallel,  the  evaluation  r)f  r  has 
been  sped  up  by  a  factor  of  A. 

3.1  Specifying  Constraints 

P'otir  princiital  methods  may  be  used  for  deriving  constraints  (t  he  ( in  our 
(jrevious  examjtle)  to  restrict  the  data  matched  by  a  rule  under  the  cojiy-and- 
constrain  paradigm.  In  the  first  case,  if  a  free  variable  r  in  a  mb'  condition 
corresponds  to  the  restriction  attribute  (RA),  and  j-  takes  values  from  a  finite 
entimerable  set  of  cotistants  V,  we  may  construct  the  replicas  by  restricting  r 
to  subsets  of  P.  Secondly,  if  there  is  an  easily  computalile  hash  function  on  the 
RA,  w('  can  apply  hash  partitioning.  Tlie  third  method  apjilies  if  tin'  RA  has  a 


190 


totally  ordered  domain.  We  can  tln-n  simply  <livitle  the  ran^e  of  the  H  A  ilirectly 
into  disjoint  subranges.  'Idle  fourth  scheme,  which  generalizes  the  prere.lmg 
three,  is  based  upon  data  clustering.  Here,  a  rule  is  constrained  by  restricting  it 
to  fragments  of  a  base  relation  H.  I  he  fragments  are  constructed  on  the  liasis 
of  a  distance  metric  didined  for  the  tuples  of  l{ .  by  tormiiig  iiiniiniiini  distnuci 
(  lusters  around  sev('ral  ii  nlmid  tiiph  for  simplicity,  we  ri'siri(  i  our  discussion 
in  this  pa|)er  to  tin'  case  where  the  H.A  draws  its  values  from  an  ordered  sei . 
either  ilirectly  or  through  a  simph'  mapping,  allowing  rest  rict  ions  to  be  i  xpn  sseil 
as  simple  arithmetic  predicates.  However,  the  ti'chniipies  we  di-velop  carrv  over 
to  the  general  case  as  well. 

3.2  Heuristics  for  Choosing  Restriction  Attributes 

To  balance  the  load  of  distributed  rule  proci'ssing,  w<'  need  to  estimate  the 
amount  of  computation  reipiired  to  I'.xecute  a  rule  program  against  a  particular 
database.  However,  the  databa.se  in  ipiestion  may  not  be  known  at  ci  iiipile  lime 
For  example,  the  IDR  distribution  is  not  known  initially.  (In  spi  rial  cases,  it 
may  be  possible  to  estimate  the  size  of  the  IDH.  Se**  [Woltson  il  III.,  iihtd]). 
Moreover,  the  EDH  relations  often  caiiiiol  be  exliauslively  analyzisl.  since  they 
may  be  very  large.  This  suggests  a  heuristic  approach  for  choosing  HAs.  1  In- 
heuristic  [irocess  we  use  for  identifying  the  RA  for  a  rule  r  procemls  in  two 
phases.  In  the  first  phase,  a  .set  of  possible  K.As.  called  the  RA  ( ''iiidtdith  Sit. 
is  identified.  .Suitable  RA  candidates  are  dei<>rinined  on  the  basis  of  the  degn'e 
of  uniformity  of  the  distribution  of  the  attribute  value,  as  widl  as  the  sizi'  of  the 
underlying  relation.  In  the  second  pha.se.  the  candidate  set  is  reducinl  to  a  single 
chosen  restriction  attribute.  .Space  limitations  do  not  jiermil  a  full  description 
of  the  process  in  this  naper.  The  rea-lec  is  encoicaged  to.sei'  [Dmvan  and  Stolfo, 
1993]  for  details. 

4  Dynamic  Reorganization  in  PARADISER 

In  F’ARADISER,  statistics  kejit  in  system  ca»alog«  play  a  key  rnh'  in  worklo.ul 
partitioning  for  distributed  evaluation  of  general  rule  jirograms.  Fhe  catalogs 
maintain  the  freciuency  distribution  of  tuples  according  to  the  valui'  of  tin'  RA  for 
a  given  relation.  This  is  called  the  discrete  fretpiency  distribution  (IlFl)),  and  is 
maintained  as  a  discrete  histogram.  The  DFI)  can  be  used  to  derivi'  restriction 
predicates  that  select  a  subset  of  the  tujiles  of  a  relation  according  to  any  desin'd 
workload  schedule,  called  loading  weights.  A  sophisticated  facility  creates  the 
DFDs  at  compile  time,  and  automatically  maintains  them  using  triggers  as  tin' 
datalia.se  changes  at  runtime. 

4.1  The  Isoweight  Copy-and-Constrain  (ICC)  Algorithm 

'Fhe  procedure  for  deriving  rf'strictions  is  to  compute  subrangi's  such  that  each 
subrange  contains  a  desired  fraction  of  the  total  numlier  of  tuples  of  the  rela¬ 
tion  from  which  the  restriction  attribute  is  drawn.  This  u.ses  a  teclmitiue  called 


191 


WiK/hlut  Rmiyt  Ri>litttui)  (WHS)  WHS  siil>rHii);i>  fliai  lla  nuin 

tii'r  I't  tiijilcf-  wiili  ri'Sj)f(t  til  till'  1)1  1)  III  iJiili  siiliraiim'  i>  )iriiiii  >r(  n  iimI  in  ti,' 
cnrri'sjHiiuliiia  i|tsir<’<l  lnailiiij!,  wciKlil  (livi'ii  tii«'  Inailiiin  \vi  if;lii>  t/, .  ;  ^  I  /' 
ti'r  a  srl  ii(  /'  iM’iui  ssiirs,  Mil>raii^i->  l  aii  In-  c  niiiiaili  ii  in  a  i-iiirIi  m  an  nl  a 

(lata  >1  nil  t  nri'  t  liat  n  intains  t  In-  1)11)  In  1(  .  tin  !■  ladint;  wiialii  >  ari  initially 

cluiM'ii  til  l)f  all  iiiual.  i.i-..  ij,  =  p.  /  =  I  /’  In  ri'n1ra'>i  l)H.\  ailpisis  iln-. 
vwijrlits  (lynainically  liy  uliscrvint!;  tin-  (htIi iriiianri  .  i|  inilivnliial  jiti u  i  -.miih  -'ll •  - 
(luring  systniii  niit'rat  ion. 

4.2  The  Dynamic  Restriction  Adjustment  (DRA)  Algo¬ 
rithm 

A  (lynaiiiic  load  halanring  iirotncol  may  ciiiisnlcr  all  tin  siii>  in  tlir  t  nsrinli|( 
ot  (irort'ssors.  nr  indi'ia'lnli'iil  i/7<'ii/is  of'silrs  from  tin-  .-nscinl'l'  In  grin  ral  a 
jiartlcular  (irotociil  may  dividr  tlir  ■■n.'irmlilr  min  oin-  or  niorr  griin|iv  iirmr  i“ 
aindyiiig  ilm  strategics  it  riicuilr.s.  Itrri .  wr  oiiilinc  the  1)H.\  algnritlmi.  wlm  h 
(■(.)iii|)tit('s  tin'  adjustments  to  the  rest  nri  imi  predicairs  attaidnd  to  tin  rules  ol’ 
the  jirogratii  versions  in  a  groiiii.  The  adjustment  of  the  r(>st rni ii ui  iirediiaie 
on  a  particular  rule  version  I's  of  some  rule  /•  at  a  sit(  >'  is  initiated  only  if  tin 
cotiipletion  time  of  f.s  differs  from  the  average  comiiletion  time  of  ;•  o\er  all  siie.s 
iti  the  group  by  more  than  some  specified  threshold. 

Ill  [)KA.  file  restrictions  on  rule  versions  ar<  adjusted  using  loading  ueighis 
r/^  computed  at  runtime  for  e;tch  rule  rat  sites',  lodividethe  l)F|)  by  way  ofiln 
WHS  techniciue.  While  ICC  inititilly  divides  tin-  DTI)  of  iteiu^  lajually  among 
the  sites,  OKA  us('s  loading  weights  tli.at  are  computed  from  observations 
of  runtime  (rerformance.  I'lie  idea  is  to  a.ssign  gri'ater  loading  weights  to  faster 
sit('S.  and  vice  versa. 


4.3  An  Example 

As  an  example,  consider  a  simple  rul<'  (irogram  II  =  {ci.r^. }.  and  two  sites 
{.Vi.^’a}  over  which  the  iirocessitig  of  11  is  to  be  distributed  bet  the  rijdicati'd 
databa.sp  be  as  follows: 


{A(  1 ).  AC2),  A(:{),  A(h).  ,d(  lU).  .1(25).  li{  I ).  /ff  l),  [){ 1 ).  /J(ii) ) 

Supp(3se  that  j’  in  relation  A  of  both  rules  is  chosen  tvs  the  HA.  'riien  ICC  may 
produce  the  following  workload  partition,  as  evident  in  the  rest  ri(  lion  |>r('dic.at('s 
on  the  LHS  of  the  rules; 

(.S-,  )r,  :.a(r),W(i/).{  I  <.,<,•))  -  f(,r,,v) 

(.S'l  )r2  :  .4(j  ),/)(,v),(l  <  j-  <  ;i)  — 

(  :  A(.i  ),«(!/),(:)<  J-  <  2.'))  - 

(,S'2)r2  ;  ,4(j  ),/)(!/).  (.'5  <  J'  <  i-S)  —  R{r.i/} 

Note  that  each  subrange  specified  by  the  restriction  (iredicati's  iiicludi  s  the  same 
nuniber  of  tuples,  and  that  l(  '(  '  is  sensitive  to  I  he  nonuniformity  of  the  distribu¬ 
tion  of  the  values  over  the  domain  sjian  of  A.^r.  Sujijtose  unbalanctnl  o|)eration 
is  detected  by  the  dynamic  load  balancing  itrotocol.  Conseriuently,  DHA  is  ap¬ 
plied  to  compute  adjiistrnents  to  the  restrictions,  ,S'u(ijtose  that  .S',  and  A'-j  had 


192 


■■  .111  I'li'lii  iti  tiini->.  t  anil  '21  rc'.|ii'(i  i\ i-ly.  1  lir  l)l<.\  alf^i  .ril  lini  wuiiM  ail|ii>l  lln 
rrst  rill  ii  MIS  sn  that  >1  imw  lias  Iwiif  as  iiiinli  work  as 

(S)  •_  a  a)  - 

(  )i..  1(  I  ).  /'(•/).  (1  I  —  /•  i  I  .  vl 

(  s'j  ,1(J  I.  H{  vl,  I'.  •  .  -'■•I  —  '  I  .  I, ) 

(.s_|,_.  -Ill  I  />l,l.r.  -  ■  —  /■(..  .,1 

5  Dynamic  Load  Balancing  (DLB)  Protocols 

\Vr  111  iw  |iri‘sriit  a  ri  illi  i't  ii  Ml  I il  l  ainliilali  il>  iiaiiin  I.  .ail  I .alaiii'iiiy,  I  I ) I.  M  1  | .1 .  .i . . 
rills  I  lial  iiiav  lir  iisi-il  in  tin-  I’A  1<  A  1  )ISK  iC  niiil  mi.  - 1  iiMr.  Miiin  ni  I  In  .litliri  nr.  s 
111  llii'  prill  I  in  ils  lir  III  lilt-  iiianiiir  in  vvliirli  lln-  an  liilii  1  nn  is  li.rirally  p.irl  1 
Innn’ir'  ami  llir  parliriilar  siirs  iliai  pari  i.apaii-  ami  riinpi  ran  wiih  rarli  .iilirr 
111  ri'lialanci'  ilir  luail.  W'r  rrlrr  In  iln-  si/r  nl'iliis  ■■pariinpaii.rv  n  .l|,ri  n  .n”  as 
llm  Kr(iii|)  sizi'  1'. 

Global  Cooriliiiator  (GC)  Protorol:  Ilii'  (ilnhiil  ( '•iiiiilntnhn  (IK  1  pr.in- 
rnl  iiiaki-'s  lnail  lialamiiij;  ilrrisinns  liasnl  iipnn  ulnl.al  inli  .mial  n  mi  nn  iln  xiat. 
Ilf  riilr  prnjiraiii  iirnrrssinn  at  all  siii-s.  fliMirr.  1  (n- nrniip  si/i  ('  1'  . . . 

II  as  filllnWs; 

1,  At  rvrry  pri n■l■ssln^•  ryrli-,  i-arii  .-ill-  kil'ps  irark  III  im  |nral  prnrraiii  ami 
mil'  I  iM/ip)(Minn  luims.  Siii-s  iliMiTiniim  llm  ftlnlialiy  niaxiiiiiiiii  I  MAX  )  ami 
liiiniiiiiiiii  (MIN)  ninipl'Minn  liiims  n(  iln-  |irn};raiii  ri  pliras  aiiiniir,  all  i.| 
I  111-  I’  silt's  111  1'  ijr(  /  ' )  sli'ps  iisiii);  a  inn  man  mill  si-lirt  inn  im  1  Inn  I  Ai  iiinsi 
liij;(/')  sirps  afli’r  llir  lasi  siir  rnnipli  lrs.  llm  rniirilinaii ir  silr  (rnni  nf  iln 
li.Mirnaiimnl  In-i')  ri'crivrs  llm  MAX  ami  MIN  valims  I  Ins  sili-  mnipiilis 
llm  iiiilialaiirt’  iimasiir>'.  /  =|  A/  .I  A  -  M  l\  | 

'2.  If  Z  >  (  '  (siinm  I  hrrslmltl ).  llm  j^lnlial  1  nnrilinalnr  lirnailrasis  a  B  A  1,  A  N  (  I 
signal.  All  sitns  iIiimi  s<miiI  rnli-  ami  pmgrain  rniiiplil  inn  liims  in  lln  t  nnr- 
(linalnr,  Fnr  farli  riili'  al  farli  silf.  ilm  rnnrdmatnr  rniiipiiii  s  .kIjiisI  im  nis 
In  llm  ri'sl  rirl  mils  using  ilm  DBA  aignrillnii  ami  si-mls  llrnsr  In  rarli  silr. 
wliicli  art-  limn  inmrpnralrd  inin  ilm  Im-.d  prngrain  rrplir.is.  1  In'  rmirdi- 
nalnr  limn  limadcasls  a  (  A  (  1.1.  signal. 

■  i.  Oil  rrrriviiig  a  lirnailrast  (  A  (  I.K  signal,  rarli  silr  lirgins  a  imw  piri irr.ssing 
rvrlt',  lirginning  willi  llm  iiialcli  pliasr 

1.  Irmiina! inn  is  drtrrird  by  llir  rnnrdinainr  win  n  t'ViTy  silr  rrarln  s  a  lix 
jininl . 

R.oiliid  Robin  (RR.)  Protocol;  llir  Hmind  [{ohm  (HR)  pmlinnl  niakrs  Inad 
balancing  ilf'ci.siniis  fnr  grniips  nf  si/.i-  1'  =  '2.  i•’.arll  prnrrssing  silr  lias  a  lisiim  i 
■paired  silr  wbirli  cliangrs  in  a  ii/itic  fasliinii  at  rarli  iiivnratnMi  nf  ilm  Inad 
balancing  algnritlilii.  I  In'  iiiilialaimr  is  rnmpiilrd  jiairwisr  brlwfrii  a  silr  and  its 
pain'd  "targrt  " .  ami  balanring  is  init iaii'd  nii  gmnps  nf  sizr  1'  =  '2  using  llm  1  )R.'\ 
algnritlilii.  (ilnbal  synrliriMiizal  inn  is  still  |)t'rfnrmr<l  by  a  rnnriliiiatnr  silr  wliirli 


193 


iisi>  a  .siiiiplf  Cl  lUiit  iii>^  liic'i  liaiiiMii  111  lll■ll■rt^llll••  in  li  icarii  liiiiic  linn-  u  lii  i  In  r  ill 
sitrn  have  ciiiu|i|ili'il  tlii'ir  lialaiicinc  aciii.n>  I'lii-  ih  lail.--  nl'  lln  |■^.lt.lc,,|  are 

'iiiiitleil  here  line  111  space  1  iinit al  n nis 

Raiiiiont  Proliiiij;  ( R  P )  Protoeol:  \s  m  UK  lie  linii'IdiK  /'/e/n//'/  ilil'i 

prulnciil  makes  luail  lialancinu,  >leci^i..ns  |.ir  i;r'.n|)s  .•(  s|/,-  |  -  ’j  l.acli  siie  has 
a  "jiairei  I  "  sp  e  which  chanties  in  a  iiunh'iii  lashn  .n  at  each  inv  i  m  at  imii  ■  if  i  he  I.m'  I 
halancinn  al^i  irit  hm  In  cnulrasl.  UK  changes  the  pairinf;i't  pri'Cissine  sil's  in 
a  iletermmisi  1C  I'ashimi  In  nlher  respei  is  this  |,ri .li ici >1  is  similar  i"  UK 
Log!/')  Nearest  Neighlior  (LNN)  Protoeol;  This  priiincul  is  similar  I'l 
( i( 111  iwi'Vi  r .  each  sit e  ci  m lines  its  halancin;!  act  n  iiis  i ■ .  a  r;.r' m p  i  ■(  h  ic  I  /  ' )  'll es 
where  the  iiiial  iiumher  ni  sites  Is  /’  I  Inis  l  he  u,ri  mp  si/e  is  I  -  |iic(/'l  lie 
prill  1 1<  1 1|  I  nc  I  lilies  a  sim  pie  means  1  if  part  I. all  V  slnillliiiK  l  In-  tin  in  hers  i  >1  each  i;ri  ii:  p 
111  a  raiiilnm  fasliinii  prmr  tiietnli  Inail  lialancini!,  phase  A  KlaAIlN'  sicnal  i' 
sent  Irnin  mie  "representatiM  site  in  each  "r'Mip  in  a  ciniriiinalnr,  1  In  cimrili 
liatnr  can  still  hrnaiicasl  ineverv  siim  This  pri.incnl  has  h  aver  ci  immnmcai  n  ai 
I  iverheail  than  ( i(  .  lull  tin-  ipial  it>  i  il  h  nnl  ha  lane  mu  mav  in'i  he  as  Inch 
S<ii'tl  /')  Nmirest  Nei>;lil>or  (SNN)  Protoeol:  I  his  prninci.l  is  similar  in 

I.NN  IlnWeV'T  each  site  cnillilles  its  halailCllm,  aclinlis  In  a  Itrnllp  nf  \  I’  sites 
where  the  intal  inmiher  nf  sites  is  /’,  riiiis.  Iln  virniip  si/e  is  I'  =  \'/'  In  all 
nther  respects,  it  is  the  same  as  llle  prnincni. 

6  Framework  for  Comparing  the  DLB  Proto¬ 
cols 

1  he  relative  merits  nf  varmns  ilynamic  Inail  halancinji  strale;;ies  is  ilifliciill  in 
jinlj'e.  since  the  usual  perfnrmaine  metrics  iiseil  in  evaluate  siidi  systems  are 
sensitive  In  IlimiernllS  parameters,  e.j;.,  nelwnrk  Inpnincy  (  IM  Speeii,  speeil 
nf  I/O  channels,  prnhleiii  size  anil  sn  nii.  < ’han);inn  any  nue  n|  these  parame¬ 
ters  leinls  In  invaliilate  ;itiv  cnncliisinns  that  nne  iiiit;hl  draw  frnm  samples  nf 
perfnriiianci'  dal;i  In  circumvent  these  prnhiems.  the  i'^ik jjii  h  in  i/  metric  was 
int rndiiced  in  [Kumar  and  Kan,  19M7]  I  his  measure  relates  prnhiem  size  in 
the  niimher  nf  prncessnrs  such  that  the  efficiency  nf  the  parallel  nr  disirihiited 
cnmpiit  iiitc  ensemhle  reimiitis  constant .  Isnellicieiicy  amilysis  aims  to  capt  lire  the 
effects  nf  ;i  particiihir  alj^cirit  hm-architect  nre  cnmhinat  inn  in  ;i  single  expres- inn 
as  a  measure  nf  iralnhilitii. 

■A  [ttirallel  algorithm  is  scalahh'  if  the  ('fliriency  c;m  he  maint.iined  .it  some 
fixed  value  (hetween  0  and  1)  hy  incre.ising  the  prohieiii  size  11'  as  some  func¬ 
tion  /  of  t  he  nntiiher  of  processors  /'.  Tor  examiile.  if  I  he  prohleiii  size  11  must 
grow  exponentially  with  /'  to  imimt.im  :i  fixed  value  of  efiiciency,  then  the  al¬ 
gorithm  is  ill-siiited  to  th.'it  architect  lire,  since  to  ohtain  reasniitihle  speediips, 
the  problem  size  must  grow  at  ;in  exiioin-nl  i;d  rale  with  respect  to  the  numher 
of  processors.  .A  good  measure  of  how  well  a  |)arl iciihir  .algorithm  performs  on 
a  given  architecture  is  to  examine  how  clo.se  to  luunr  the  function  /  is.  Lin¬ 
ear  isoefliciency  functions  indicate  algorithms  that  are  ■'highly  sctihihle"  on  a 
given  archil I'ct lire.  It  c.aii  he  she.w'ii  that  /  =  i2(/')  is  a  lower  hound  on  .any 


I 

i 


194 


ifiO('fHi'ifncy  luiictidii.  Ut'iici',  any  isncfiiriciHy  funciioii  that  is  limar  or  ilusi 
to  linear  (e  g.,  /  =  0(/'log'  f‘)  tor  soiii<'  integer  e)  ar<'  highly  ilesirtihle.  In  t  he 
following  disc iission,  we  develop  tin'  haste  framework  for  analyzing  the  dynainie 
load  halancing  protoeols  we  jiresented  in  the  previous  section  in  terms  of  the 
scalability  criterion  and  isoefHciency,  based  on  the  work  of  Kumar  et  al.  VV'e 
jissnme  a  collection  of  1‘  networked  workstations  as  the  computing  architecture. 

Defiiiitiirtis: 

1.  n  :  Problem  size  or  the  tiieasure  of  computation  to  solve  a  problt  iii  in- 
stanct-. 

2.  /':  Number  of  processors  in  the  ensemble  with  ■  identical  characteristics. 

d.  r,.;  Time  tiiken  for  one  unit  of  compiitai  iotial  work.  In  the  ca.sf'  of  database 

rule  processing,  this  taay  be  taken  to  be  a  logical  ri'ad  of  a  page  from  disk 
for  the  pur|)ose  of  matching  a  rule  again.st  the  database  using  relational 
o]ierations  such  as  select,  project,  and  Jt>iu. 

■1.  T,.:  Sum  over  all  processors  of  the  time  s|)ent  in  cotiiputat ion  to  solve  the 
problem  exchtsive  of  cotiimunicat ion  and  other  ovi-rheads. 

а.  Tp-.  Ruimitig  (elapsed)  titiie  for  a  given  probh'tii  oti  <a  /’ |irocessor  enseinble. 

б.  7;,:  Sutii  over  all  processors  of  tlie  time  spent  in  overliead  activities  such 
as  ititerprocessor  cotiimunicat  ion,  waiting  for  messages,  etc. 

7.  S:  'I  he  speedup,  or  gtiiti  in  cotiiputation  spei'd.  by  iisitig  I’  processors.  It 
is  giveti  by  ,S'  = 

ft.  f. :  I  he  eflicieticy  ol  the  dist  ribnted  com  put  ing  ensemble  on  a  problem  in¬ 
stance.  Intuitively,  it  is  a  tiietisure  of  the  ul  iliztit  ion  of  cotiipiitiiig  resources 
for  the  problem  at  haiul. 

E-^  -  ‘ 

r  Tp  *  p  T..  +  r,.~  1  +  ^ 

In  order  to  perform  i.soeflirienry  analysis,  we  proceed  as  follows.  Observe 
that  contributions  to  the  overhead  can  come  from  commimicat ion  among  the 
sites,  time  spent  in  waiting  for  me.ssages  or  responses  from  other  siti's.  and  in 
starvation.  VVe  assume  that  communication  costs  su'osutiie  all  other  overhead 
costs  (which  is  reasonable  in  the  |>resence  of  disk-resident  data  that  we  fiiiida- 
metitally  |>resume)  In  order  for  the  distributed  system  to  mei't  the  isoefliciiuicy 
criterion,  we  must  have  E  =  y  for  soiiu'  constant  y.  Frotii  the  defitiitioii  of  E. 
this  im|dies  Yjf*p  —  ^  ^ •'^inee  /,.  =  I  ',.  +  U  and  I  ',,  is  fixed,  we 

must  have  IT  -x  P  *  J'p 

Our  approach  to  establishitig  an  expression  for  If  as  a  funclioti  of  P  is  to 
derive  a  lower  bound  on  Tp  in  terms  of  the  number  of  processors  P.  In  the  next 
secti(5n  we  treat  each  of  the  ilyiiaiiiic  load  balancing  protocols  we  have  previously 
presf'iiti'd. 


195 


7  IsoefRciency  Analysis 

Global  Coordinator  (GC): 

In  this  protocol  \og(f‘)  steps  are  needed  In  coinpiite  1  lie  fflohal  inaxiiiniiii  and 
liilniiiuini  of  the  coinpletion  times,  in  the  \vi>rst  r;ise,  the  cniiipiiled  value  of  im- 
halance  I  would  initiate  load  halanrin^  arlions  at  the  end  of  every  rvcie.  'I'lien. 
each  site  conminnicates  local  information  (e.j»..  completion  times)  lo  thi'  roorili- 
nator.  The  coordinator  computes  adjusted  raiiRe  restrictions  and  commimicates 
them  to  each  site  in  steps,  since  at  least  1  messaj^e  is  necessary  for  every 

site.  Thus,  the  worst  case  lower  hound  on  7p,  compnteil  on  the  basis  of  the 
recpiireiiients  imposed  by  the  load  balancing  protocol,  is  Then  we  can 

write; 

fV'  OC  f’  (log(f')  +  /')  =t>  U'  =  Cl  /'  log{/')  +  c-,P' 

where  Ci ,  r-j  are  arbitrary  constants.  If  load  balancing  is  seldom  reipiired,  c-j  is 
expected  to  be  small,  and  U’  =  Gj /•’ log( /•’)):  otherwise,  it  is  (){p-). 

Round  Robin  (RR); 

For  this  itrotocol,  computing  the  target  for  each  .source  site  at  i'\or\  cycle  is  a 
(){ 1)  operation.  .Similarly,  sending  loctil  completion  time  data  to  the  target  and 
receiving  liack  the  adjusted  range  information  from  the  target  is  also  doin'  in 
0(1)  steps.  Tlie  mtmlier  of  sites.  '■( ’Ol'NT",  fliat  will  enter  a  balancing  phase 
can  bo  comiminicatetl  to  the  coordinator  within  G(log(/'))  steps  from  the  timi' 
the  last  site  has  completed  its  match  phase,  using  a  tournament  algorithm.  I  s- 
ing  the  same  technitiiie,  the  coordinator  can  be  informed  that  the  last  site  has 
completed  its  balancing  actions  in  G(log(/'))  steps.  Thus,  the  coordinator  waits 
0(log(f'))  time  before  it  ran  broadcast  a  message  signaling  for  the  next  cycle 
of  rule  proce.ssing  to  begin.  If  we  assume  that  the  sites  art'  roughly  matched,  a 
lower  bound  on  7’p  is  f^jlog) P)).  Henci'.  for  some  arbitrary  constant  c. 

IV  -X  r  (log{f’))  =>  IV  =  e/’  log(r) 

Random  Probing  (RP): 

For  this  protocol,  computing  the  target,  for  each  site  at  ever\  cycle  retpnres  an 
average  case  analysis  for  the  number  of  random  probes  necessary  bi'fore  a  probe 
is  successful  in  finding  a  free  ttirget.  It  can  be  shown  using  a  straightforward 
analysis  [Kumar  et  al..  1991]  that  (){F  log(/'))  probes  are  needed.  As  in  the  case 
of  RR,  once  the  target  has  been  cho.sen,  sending  local  completion  time  data  to 
the  target  and  receiving  back  the  adjusted  range  information  from  the  target  can 
be  done  in  0(1)  steps.  As  in  RH,  th<’  coordinator  knows  within  0(log(/'))  steps 
from  the  time  the  last  site  has  finished  balancii)-,  actions  that  it  can  brciadcasi 
a  signal  to  start  a  new  cycle.  Thus,  the  hiwer  botiml  on  7’p  is  given  by  the 
dominating  term  ()(P  log  (/'))•  Hence,  for  some  arbitrary  const:int  c, 

IV  oc  P  (  /'  log(  /'))  =1^  U'  =  <  P~  log(  P) 

Log( /O  Nearest  Neighlior  (LNN): 

Vsing  an  analysis  similar  to  that  for  (IC.  but  using  a  set  of  log(/')  sites,  we  can 


196 


derive  a  luw(“r  luuind  nii  Ti>  as  fullows.  I  lie  rniit nliiit ioii  In  //■  (nuii  the  halane- 
iiiR  tu'tinns  alone  from  a  unnip  of  size  /')  is  /')  +  loj;,!  /'))  lKniirin;>, 

tlie  doiihle  logarithm,  this  cunt rilnit ion  is  0(lug(/')),  A  ri'presental ive  site  from 
each  group  commnnirates  the  KKAI)\  signal  to  the  coordinator,  winch  then 
broadcasts  a  message  signaling  the  coni innat ion  of  normal  processing.  Since 
there  are  representatives,  the  H^A[)^  signal  from  all  representatives  can 

he  detected  hy  the  coordinator  in  ^-^(log  ( ))  =  ^’))  lime.  1  he  broad¬ 

cast  of  the  CYCLE  signal  takes  f->(l)  time.  Thus,  the  lower  hound  on  7'p  is 
0(log(P)),  and  for  some  arbitrary  constant  c.  we  have 

U'  X  /' (h>g(/'))  =>  U  =  (■/'  log(/') 
x/T*  Nearest  Neighbor  (SNN): 

Using  an  analysis  similar  to  the  one  for  L.N.N.  we  see  that  the  ci'iit ribnt ion  to 
Tp  from  the  balancing  .actions  alone  from  .a  group  of  size  i>  f.*(log(CT’)  + 
\/P).  Asymptotically,  this  contribution  is  ()(\/T‘).  A  representative  site  from 
each  group  communicates  the  REA[)\'  sigiml  to  the  coordinator,  which  then 
broadcasts  a  CYCLE  signal.  Since  iheri'  .are  =  s/P  represent  at  ivi's,  the 
READY  signal  from  all  representatives  can  he  detected  by  the  coordinator  in 
0(log\/P)  =  Cl(iog(f^))  time.  The  broadcast  of  (he  READY  sig/ia)  takes  fMI) 
time.  Thus,  the  lower  bound  on  Tp  is  (){\/V),  and  for  some  arbitrary  constant 
c,  we  have 

lU  x  /'  CT*  lU  =  rP'  ■> 


8  Conclusion 

The  analysis  indicates  clearly  that  for  large  numbers  of  proce.ssors,  which  we 
realistically  assume  will  be  commonplace  in  the  foreseeable  future,  system  reor¬ 
ganization  is  best  implemented  by  either  the  Round  Robin  (RR)  or  Log  .Nearest 
Neighbor  (LNN)  protocols.  In  either  protocol,  the  effect  of  the  constant  fac¬ 
tor  c  on  scalability  is  clearly  crucial.  The  fret|uency  of  imbalance  is  dependent 
upon  how  well  the  DRA  algorithm  beliavi-s  in  practice.  If  DRA  does  a  poor 
job  of  jtredicting  the  amount  of  comput.ation  required  by  a  particular  jirogram 
replica  at  a  jirocessing  site,  or  the  computational  load  jilaced  on  a  site  by  some 
external  ajiplication.  then  dynamic  reorganization  will  be  frequent  and  costly. 
It  will  be  therefore  important  to  design  a  scalable  system  that  minimizes  these 
costs  as  much  as  possible.  Thus,  the  RR,  or  LNN  protocols  would  be  appropriate 
candidates  for  implementation. 

In  our  future  work,  we  aim  to  study  tlie  behavior  of  the  l( '( '  and  DRA 
algorithms  under  realistic  jiroblem  scenarios.  A’he  fre(|uenry  of  applicatioti  of 
DRA  depends  upon  the  particular  problem  and  the  distriluition  of  data.  The 
heuristics  employed  in  ICC  and  DRA  are  intended  to  provide  a  rapid  decision 
in  distributing  the  workload  at  runtime.  The  heuristics  may  not  work  well  in 
certain  ca.ses.  The  types  of  jirobleins  which  pose  challenges  for  tin'  methods  we 
have  developed  will  be  the  focus  of  our  attention  in  future  work. 


197 


References 

[Dewaii  anil  Stolfo,  199.l]  H.M.  Dewan  aiul  S.J.  Stolfo.  'I'lic  nistril)iiti  (l  FAaliia- 
t ion  of  Rules  in  I'ARADISER.  I'erliuical  Report  In  Preparation,  Department 
of  ('omputer  .Sfienre,  ('olumbia  I’niversity,  May  (experteil)  PJft.’t. 

[Dewan  ,1  ai.  1992]  11.  M .  Dewan,  1)  Olisie,  S.l.  Stolfo.  f),  Wolfsoii.  ami 
S.  DaSilva,  incremental  Database  Rule  Proci'ssiiif;  in  P.AR.ADI.SKIP  lotir- 
iial  of  hittlligi  iil  Injoniiatuni  Sgsh  ur-i.  1:2,  October  1992. 

[Hanson  and  VVidom,  1992]  L'..N.  Han.son  and  J,  Widom  An  Overview  of  Pro¬ 
duction  Rules  in  Database  Systems,  'leclinical  Report  H.I  9029  (S0-1S9).  IRM 
Research  Division,  October  1992. 

[Kumar  and  Rao,  1987]  V'.  Kumar  and  V.N.  Rao.  Parallel  Depth-First  Search. 
Part  IP.  Analysis.  I.  Journal  of  Parallel  Prograiiniitng.  l(j(());r)01  al9.  I9S7. 

[Kumar  el  ai.  1991]  V.  Kumar,  A. A',  (irama.  and  V'.N.  Rao.  Scalable  Load 
Balancing;  Techninues  for  f’arallel  Computers.  Technical  Report  TR  Ol-a'). 
Department  of  ( 'omiuiter  Science,  Pniversity  of  Minin  '  Septi'mlter  1991. 

[Oh.sie  f/  ai.  1.999]  D,  Oli.sje.  }J,.\P  Dewan.  S..).  .Stolfo,  and  S.  DaSilva.  Perfor- 
matice  of  Incremental  I'pdate  in  Database  Rule  Processing.  February  1999. 
Subtiiitted  to  the  19th  VLDB  ( 'otiference. 

[Pasik,  1987]  A.  Pasik.  Itnproving  Production  System  Performance  on  Parallel 
Architectures  by  Creating  Constrained  Copies  of  Rtiles.  Technical  Rejiort 
CUC.S-9l.9-87,  Department  of  Computer  Science,  Columbia  University.  19S7. 

[stolfo  ei  ai..  198.5]  S.  .Stolfo,  D.P.  Miranker.  and  R.  .Mills.  A  .Simple  Proce.ssing 
.Scheme  to  Extract  and  Load  Balance  Implicit  Parallelism  in  the  Concurrent 
Match  of  Production  Rules.  In  Pror.  of  the  AFIl’.'i  ."^ynipostuin  on  Fifth  (!en- 
eration  Cloinputing.  198.5. 

[stolfo  et  ai,  1991]  S.  Stolfo.  ().  Wolfson.  P.  Chan,  H.  Dewan,  L.  Woodbury, 
.1.  (Hazier,  and  D.  Olisie.  PARULEL;  Parallel  Rule  Processing  Using  Meta¬ 
rules  for  Redaction.  ./.  Parallel  and  Distrih.  Computing.  19-4:9B()-982,  1991. 

[Stolfo,  1984]  S.  .1.  Stolfo.  F  ive  Parallel  Algorithms  for  Proiluction  System  Ex¬ 
ecution  on  the  DADO  Machine.  In  Pror.  AAAI  Conf.  AAAI,  1984. 

[Wolfson  and  Ozeri,  1990]  O.  Wolfson  and  A.  Ozeri.  A  New  Paradigm  for  Par¬ 
allel  and  Distributed  Rule-processing.  In  Proc.  A('M-SICMOI).  1990. 

[Wolfson  ei  ai,  1991]  O.  Wolfson,  H.  [)ewan,  S.  Stolfo,  and  Y.  Y  emini.  Incre¬ 
mental  Evaluation  of  Rules  and  its  Relationshii)  to  Parallelism.  In  I’roe.  of 
the  A('M-SI(1M0D  1991.  Inti  Conf.  on  the  Management  of  Data.  1991. 

[Wolfson  rt  ai.  1999]  O.  Wolfson,  W.  Zhang,  H.  Butani,  A.  Kawaguchi,  and 
K.  Mok.  A  .Methodology  for  Evaluating  Parallel  Oraidi  Algorithms  and  its 
Apidication  to  Single  Source  Reachability.  In  To  appiar  in  proreeding.'i  of 
PDIS-9d,  1999. 


Using  Semantic  Information  for  Processing 
Negation  and  Disjunction  in  Logic  Programs 


Terry  ( Jaasterlaixi’  and  Jorge  Lol)o“ 

'  Mathematics  and  Computer  Science  [division,  Ar^onne  National  Laboratory. 

gaasterland'oimcs.  anl.gov 

^  Department  of  Electrical  Engineering  and  Computer  Science.  University  of  Illinois 

at  Chicago.  jorge  Weecs. uic.edu 


Abstract.  There  are  many  applications  in  which  integrity  constraints 
can  play  an  important  role.  An  example  is  the  semantic  query  optimiza¬ 
tion  method  developed  by  Chakravarthy,  (irant,  and  Minker  for  definite 
deductive  databa.ses.  They  use  integrity  constraints  during  query  pro¬ 
cessing  to  prevent  the  exploration  of  search  space  that  is  bound  to  fail. 
In  this  paper,  we  generalize  the  semantic  query  optimization  method 
to  apply  to  negated  atoms.  The  generalized  method  is  referred  to  as 
semantic  compilation.  We  show  that  semantic  compilation  provides  an 
alternative  .search  space  for  negative  query  literals.  We  also  how  seman¬ 
tic  compilation  can  be  used  to  transform  a  disjunctive  database  with  or 
without  functions  and  denial  constraints  without  negation  into  a  new 
di.sjunctive  databa.se  that  complies  with  the  integrity  constraints. 


1  Introduction 

There  are  many  applications  in  which  integrity  constraints  ran  play  an  impor¬ 
tant  role.  For  example,  before  expanding  a  databa.se  tpiery.  it  is  po.s.sible  to 
incorporate  the  integrity  constraints  into  the  query  to  olttain  a  .semantically 
equivalent  qtiery.  The  new  query  contains  constraint  information  that  prevents 
the  exploration  of  search  space  that  is  bound  to  fail.  In  [('C1M90],  a  semantic 
(|tiery  optimization  method  is  described  that  compiles  integrity  constraints  into 
queries  based  on  common  positive  literals  that  occur  in  both  the  constraint  and 
the  query.  Database  rules  are  optimized  at  compile  time  in  order  to  minimize 
the  amount  of  computation  that  must  be  done  at  run  time.  If  queries  are  known 
at  compile  time,  the  entire  query  can  be  optimized.  Otherwise,  any  remaining 
optimization  is  done  at  run  time.  Bsised  on  the  same  principles,  integrity  con¬ 
straints  can  be  also  used  to  generate  cooperative  and  informative  answers  and 
model  user  needs  (see  [M(190,  (iaa9'2]). 

We  generalize  the  semantic  query  optimization  method  to  apply  to  negated 
atoms.  The  generalized  method  is  referr  I  to  as  .semantic  compilation.  This  ex¬ 
ploration  has  led  to  two  significant  results.  The  first  result  affords  a  new  method 

‘Terry  (laasterland  was  supported  by  the  Office  of  Scientific  ( 'onipiiling.  I'.S. 

Departineut  of  Energy,  under  ('ontract  W-31-ffl9-Eng-ifS,  and  .lorge  Lobo  under 

NSF  grant  #lRl-921t)220. 


199 


It)  process  negative  ipiery  literals.  Semantic  compilation  provides  an  altt-rnative 
search  space  for  negative  (piery  literals.  Traversing  the  alternative  search  space 
can  find  answers  in  cases  for  which  negation-as-finite-failure  and  const  nidi  ve 
negation  may  not.  To  use  the  negation-as-finite-failnre  method  to  expand  a  neg¬ 
ative  (piery  literal,  the  negative  literal  must  he  ground,  and  the  search  space  of 
the  positive  ground  atom  must  he  finite.  To  u.se  constructive  negation  effectively, 
the  literal  does  not  need  to  he  ground  hut  the  search  space  must  still  he  finite. 
Semantic  compilation  with  negation  as  described  in  this  paper  allows  correct 
answers  to  he  found  for  negative  literals  that  are  neither  ground  nor  finite.  How¬ 
ever,  mdess  the  integrity  constraints  completely  describe  all  allowed  states  of  tlie 
database,  the  answers  for  the  negative  literal  may  not  he  complete. 

The  second  result  applies  to  disjunctive  deductive  datahasi^s.  M inker  and 
Fernandez  [FM9i]  noticed  that  there  are  many  sit  nations  that  can  he  represented 
directly  with  a  set  of  di.sjunctive  statements  and  a  set  of  constraints  that  restrict 
the  statements.  To  answer  queries  over  such  databases  Fernandez  and  M inker 
generate  the  minimal  models  of  the  database  and  use  the  constraints  to  select 
the  models  that  give  the  semantics  of  the  database.  Their  algorithms  work  for 
stratified,  function-free  databases. 

We  show  how  for  positive  disjunctive  deductive  dataliases  with  or  without 
functions  and  denial  constraints  without  negation,  semantic  cotiipilation  can  Ix' 
adapted  to  avoid  the  construction  of  the  minimal  models.  Other  advantages  of 
this  approach  are  that  minimal  models  do  not  have  to  he  finite,  and  we  cati  u,se 
any  proof  procedure  for  di.sjunctive  databases  to  evaluate  queries. 

The  next  .section  gives  background  definitions.  Section  3  describes  atid  ana¬ 
lyzes  the  method  for  handling  negative  literals  with  semantic  compilation.  Sec¬ 
tion  4  describes  the  compilation  of  constraints  into  disjunctive  databases. 


2  Background 

Deductive  databases  are  comprised  of  syntactic  information  and  .semantic  in¬ 
formation.  The  .syntactic  information  consists  of  an  iniensional  database  (IDB) 
which  is  a  set  of  clauses  (or  rules)  of  the  form  Ai  V  •  •  •  V  c\„,  ^  ffi , . . . ,  , 

711,11  >  0,  where  each  Ai  is  an  atom  and  each  Bi  is  a  literal  (i.e.  an  atom  or  its 
negation),  and  the  exlensional  database  (EDB)  which  is  the  set  of  clauses  of  the 
form  yl]  V  •  •  ■  V  Am  *— •  IDB  clauses  are  also  called  rules.  EDB  clauses  are  also 
called  facts.  If  all  the  /f,s  in  the  IDB  rules  are  positive  literals  and  m  —  1,  tin”' 
the  deductive  database  is  called  definite.  If  in  is  larger  than  1  in  at  least  om-  of 
the  clauses,  the  database  is  called  disjuruF  A  pair  of  literals  is  said  to  be  a 
jiair  of  compleineiilary  literals  if  one  is  an  atom  and  the  other  a  negated  atom. 

The  semantic  information  about  the  database  consists  of  a  set  of  integrity- 
constraints  (IC').  The  constraints  considered  in  this  paper  have  the  form  ^  f  j, 

.  , El . Em  where  each  ('i  is  a  literal  who.se  predicate  appears  in  an 

EDB  fact  or  the  head  of  an  IDB  rule  and  each  Ei  is  an  evaluable  expr(\ssion. 

An  integrity  constraint  restricts  the  states  that  a  database  can  taki'.  For 
example,  the  integrity  constraint  “No  person  can  be  both  male  and  female,”  — 


pf  rsc>n( X). malt  (X), female (X),  rest  ricts  people  in  a  database  from  liaving  bolli 
properties.  Because  the  con.straiiits  on  a  database  do  not  enable  the  deduction  of 
new  answers  but  rather  give  information  about  existing  knowledge  and  answers, 
they  are  considered  semantic  information  rather  than  syntactic  information, 

( 'hakravarthy,  (irant,  and  Minker  [('(;M90]  have  shown  how  integrity  con¬ 
straints  ran  be  compiled  into  a  c|uery  to  identify  space  that  is  bound  to  fail.  VVe 
now  describe  their  algorithm.  The  algorithm  assumes  that  the  EDB  and  IDB 
contain  function-free  definite  clauses,  and  that  the  IC  is  a  finite  set  of  first  order 
clauses  that  always  hold  in  the  theory  EDBUlDB.  It  is  assumed  that  the  set  IC 
contains  constraints  that  can  be  derived  from  other  constraints  in  IC. 

The  semantic  optimization  process  performs  two  preliminary  steps;  fladening 
of  the  IDB  clauses  so  that  all  IDB  clause  bodies  contain  only  EDB  predicates 
or  rer\irsive  predicates  and  variable  substitution  of  the  IC  clauses.  Flattening 
is  performed  according  to  [Rei78]  and  consists  of  a  series  of  unfoldings  of  the 
rules  in  the  IDB.  A  database  with  direct  recursion  can  be  flattened  by  flattening 
the  nonrecursive  predicates  in  the  recursive  rtjles  before  merging  the  rules  witli 
the  constraints  [(i(iL'''93].  Partial  flattening  can  be  done  in  rtiles  with  nondi- 
rect  recursion  by  inspecting  the  dependency  graph  as.sociated  with  the  rules 
[(l(jL'^93].  Without  loss  of  generality,  but  possibly  with  some  loss  of  potential 
(piery  processing  efficiency  at  run  time,  the  flattening  process  does  not  have  to 
be  complete.  Moreover,  the  negative  literals  are  not  flattened. 

Background  defiiiitiou:  Variable  substitution 

A  variable  substituted  form  of  an  integrity  constraint  is  a  clause  obtained  by 

1.  Replacing  each  distinct  constant  in  the  constraint  by  a  distinct  variable  and 
adding  the  fact  that  the  constant  equals  the  variable, 
a.  Replacing  each  variable  that  occurs  more  than  once  by  new  distinct  variables, 
and  adding  equalities  that  represent  the  bindings. 

After  preprocessing  the  IDB  and  the  IC  using  variable  substitution,  the  next 
step  in  semantic  optimization  is  to  find  all  partial  constraints  that  apply  to  each 
IDB  clause.  A  residue  is  obtained  by  determining  whether  part  of  .some  clause 
in  !('  partially  subsumes  an  IDB  clause.  If  so,  then  the  remaining  part  of  the  1C 
clause  becomes  a  residue  that  constrains  the  IDB  clause  -  the  residue  expresses 
something  that  must  be  true  when  the  clause  is  true.  The  general  form  of  the 
integrity  constraints  that  was  produced  by  variable  substit\ition  allows  us  to  find 
all  possible  residues. 

Background  definition;  Partial  subsumption 

A  (variable  substituted)  integrity  constraint  I  partially  subsumes  a  rule  A  if  a 
subset  of  I  .sub.sumes  A  but  I  does  not  sub.sume  A.  An  integrity  constraint  I  =• — 
partially  sub.sumes  a  rule  R  =  A  ■>—  B\ ,  B2,  ■  ■  ■ ,  Bm.  if  and  only  if 
there  exists  a  nonempty  subset  S  of  and  a  .substitution  0  .such  that 

SBC  {Bi,....B,n}- 
Background  definition:  Residue 

If  . f'n  are  the  literals  in  the  constraint  that  are  not  in  S,  then  the  clause 

—  . f  o  residue  obtained  from  R  and  1. 


201 


Once  a  set  of  rositluc,';  is  found  for  an  1[)H  clause,  each  residue  may  lie  in 
coriiorated  into  the  clause.  Attaching  the  residue  to  the  IDB  clause  produces  a 
semantically  eiiuivalent  rlau.se. 

Background  definition:  Seniantically  constrained  rules 

.1  •ieinanlifally  conslraintd  rutf  i.s  defined  to  be  of  the  form 

.1  -  Ih . {/f, . [{„}. 

Here  .4  i.s  the  procedure  head  of  the  rule,  each  literal  H,  i.s  either  a  regular  literal 
or  an  evaluable  predicate,  and  the  RjS  are  re.'ndHes  obtained  from  the  rule  A  — 
H\ . ft„,  and  the  integrity  constraints. 

Background  definition:  Semantically  constrained  database 

Let  D  he  a  definite  database  and  let  E,  /,  and  ('  he  the  EDH,  IDH.  and  ICs  lltal 
form  I).  The  semantically  constrained  database  off).  IV .  is  thi  set  of  semantically 
constrained  rules  obtained  from  I). 

Consider  a  ilatahase  consisting  of  two  IDB  clauses; 
eroTioTuir.tt^sf  ^ —  tra<ie(  I  .Z).  tratif(  Za'^J 
i  '  J  .i'i])  —  frtcntisft'l 

meaning  Cl  has  economic  ties  with  C2  if  ('1  trades  with  some  Z  that  trades  with 
('2  and  C!  trades  with  ('2  if  Cl  is  friends  with  C2.  The  variable  substituted 
integrity  constraint  above  jiartially  subsumes  the  first  IDB  clause  to  produci' 
the  residue  Cl=afa.  r2=iraq). 

A  new  .semantically  constrained  clause  is  produced  when  the  residue  is  mergml 
with  the  IDB  clause 

eTonomic.tiesi  Cl  .C2)  trade(C}.Z},  trade(Z.<'2).  { *—  Clizusa.  = 

The  constrained  axiom  says  that  it  is  not  possible  for  the  countries  <■;  and  (  ->  to 
have  economic  ties  if  ci  is  usa  and  cs  is  Iraq. 

The  semantically  constrained  database  consists  of  the  semantically  constrained 
clauses  together  with  the  EDB  facts  and  the  integrity  constraint  above. 

3  Handling  Negative  Literals 

The  semantic  optimization  method  [(XiM90)  augments  tpieries  and  rides  with 
scniantic  information  about  the  search  space  of  positive  atoms  in  a  ipiery  or  the 
rules  when  those  atoms  occur  positively  in  the  body  of  one  or  more  integrity  con¬ 
straints.  In  this  section,  we  generalize  the  approach  so  that  it  |)rovides  semantic 
information  about  the  search  space  of  negative  as  well  as  posi'ive  literals  in  the 
(piery.  The  generalized  method  handles  all  cases  iti  which  literals  with  the  same 
])redicate  occur  positively  or  negatively  in  one  or  more  constraints.  .Semantic 
information  ran  be  obtained  in  four  cases: 

1 .  When  a  predicate  symbol  of  a  positive  constraint  literal  appears  in  a  positive 
literal  in  the  ipiery  or  the  rules. 

‘2.  When  a  jiredicate  symbol  of  a  negative  constraint  literal  appears  in  a  negative 
literal  in  the  tpiery  or  the  rules. 

d.  When  a  predicate  symbol  of  a  negative  constraint  literal  appears  in  a  positive 
literal  in  the  tpiery  or  the  rules. 


202 


4.  When  a  predicate  syiiihol  of  a  positive  constraint  literal  appears  in  a  tiejfative 
literal  in  the  tpiery  or  the  rules. 

Case  1  is  covered  in  [('(1M9(J]  and  forms  the  basis  for  the  treatment  c)f  ( 'asi’s  2, 
d  and  4.  Case  2  can  he  treated  like  Ca.se  1  hy  temporarily  renaming  tlie  negated 
predicates  as  nnicpie  new  positive  predicates  and  ap|>lying  senianlic  compilation 
as  in  case  1,  Ca.se  d  reduces  to  using  the  integrity  constraint  as  a  deductive 
rule  and  is  not  particularly  interesting.  Case  4  is  the  most  interesting:  semantic 
compilation  with  negation  provides  a  means  to  liiui  answers  for  (pieries  that 
contain  possilily  nonground  negative  literals,  in  which  case  regular  procedures 
such  as  SLDNF-resolution  do  not  apply,  and/or  possibly  an  inlinile  number  of 
answers,  in  which  case  approaches  .sucn  as  constructive  negation  [Cha8M,  I’rzMltb] 
cannot  be  used, 

3.1  Algorithm  for  Gemcralized  Semantic  Compilation 

S";;. anti  rally  '■omidhng  a  set  of  integrity  con.straints  with  a  databa.se  rule  or 
query  has  two  main  steps:  (1)  obtaining  partial  constraints,  and  (2)  integrating 
the  partial  constraints  into  the  body  of  the  (piery  or  rule,  A  cpiery  may  be 

regarded  as  a  rule  of  the  form  Query  Q\ . Q„.  From  now  on,  we  will 

refer  to  both  queries  and  rules  as  axioms.  To  compile  an  integrity  constraint  into 
an  axiom  based  on  the  correspondence  of  a  positive  or  negative  axiom  literal 
ancl  a  positive  or  negative  integrity  constraint  literal,  we  deline  tinifd  pariinl 
siibsuinpHon.  The  term  uiiTed  refers  to  the  fact  that  the  correlation  is  done  on 
two  complementary  literals,  one  from  the  constraint  and  one  from  the  axiom. 
Since  the  residues  produced  hy  mixed  partial  subsumption  contain  a  different 
type  of  information  from  the  constraints  found  by  regular  partial  subsumption, 
we  call  them  mixed  residues.  We  also  define  a  merging  algorithm  that  replact's 
the  axiom  literal  with  the  disjunction  of  its  mixed  partial  constraints. 
Dehiiitioii  1.  Mixed  partial  subsumption 

The  mixed  partial  subsumption  of  an  axiom  R  =  A  . B,„.  hy  an 

integrity  constraint  I  =—  f  j . can  occur  if  and  only  if  there  exist  a  Bi 

and  a  Cj  such  that 

1.  Bi  and  ('j  are  complementary  literals. 

Ll.  There  is  a  substitution  6  such  that  ('jO  =  -'Bi. 

The  clause  tj,  ...,  ('j-\,  f  j  +  i,  ('ji)O,  is  the  mixed  residue  obtained 

from  R  and  I. 

The  integrity  constraint  ^  j  , . . . ,  ('/„  can  be  interpreted  to  say  t  hat  it  is  not 

the  case  that  f  j ,  and  ('2 . and  ( '„  are  simultaneously  true  in  the  database. 

In  particular,  if  the  constraint  is  restricted  with  the  substitution  0.  it  is  not 
the  ra.se  that  the  conjunction  (\9  A  is  true  in  the  database.  Hence, 

if  the  conjunction  ('\0  A  ■  ■  ■  A  ('j-\0  A  Cj  +  iO  A  ■■■  A  ( '„9  is  true,  then  it  is  not 
possible  for  ('j9  to  be  trtie.  Since  in  deductive  dataha.ses  negation  is  interpreted 
as  negation-a.s-failure,  this  is  ecpiivaient  to  saying  that  ->(  jO  is  true.  In  other 
words,  we  can  rewrite  the  constraint  to  be  —  ('i9 . ('j_\9. 


203 


....  (',,0).  Hut  iKitice  that  =  H,.  'I'lierefore.  w*‘  have  (H,  —  ('\B . 

(’,,0).  The  mixed  residue  lihtained  tiirnugh  mixed  partial 
subsumiition  defines  a  search  space  that  is  an  alternative  to  the  search  spare  of 
B, .  This  new  implication  deduced  from  the  semantic  information  motivates  the 
following  definition  tif  merging  a  rule  with  an  integrity  constraint 
Dofinitum  2.  Mixed  merging 

To  merge  the  mired  residue  (('i . j  +  i . f'n)f^  u‘>lh  Iht  nth  H.  nphin 

the  rule  hg  the  new  rule 

A^B^.B2 . (\f) . +  . (\,0.  B,  +  i . B,„. 

Algorithm  1.  Semantic  Compilation  with  Negations 

Let  [)B  he  an  initial  deduelive  database.  Let  Dli  contain  only  the  KDH  of  l)H 
Then. 

/.  For  each  integrity  eoostniint  in  the  database,  ust  mired  partial  subsumption 
to  find  all  mired  residues  between  the  integrity  constraint  and  the  ariom. 
ll.  For  each  literal  in  the  ariom  that  has  a  corn  .span  ding  mired  nsidiu.  usi 
mired  merging  to  replace  the  literal  with  the  mired  residue. 

For  each  integrity  constraint  in  the  database,  use  partial  subsumption  to  find 
residues  for  the  new  axiom  and  the  original  flattened  ariom  in  DH. 

Add  all  regular  residues  to  the  new  ariom  and  the  original  axiom  using  tin 
usual  merge  definition.  Add  the  new  axioms  to  DH  . 

DH  contains  the  new  semantically  compiled  database. 

This  algorithm  performs  setnantic  compilation  for  cases  1,  :5,  ami  1  above,  fhe 
following  example  illustrates  the  algorithm. 

The  database  below  describes  two  disjoint  groups  of  points  conrected  by 
edges  (i.e,,  two  distinct  connected  components  of  the  graph),  'fhe  disjointness 
can  lie  tlescribed  by  an  integrity  constraint.  The  predicate  reachable  computes 
all  ordered  pairs  that  are  connected  by  an  edge.  The  preilicate  unreachable  has 
a  search  tree  with  a  cycle  on  reachable.  K'l  is  sufficient  to  describe  the  di.sjoint 
edge  groups.  The  database  consists  of  the  following  clauses; 

%  group  1  %  group  2  ren'  htible( X .Y )  *—  t'dgef X . 

^dgtr(  <i.b).  ''dgtl  c.d).  rea>'habl^(  X .  >'j  —  rf'aofla6^pf  X .  Z  ).>'dgfi  Z.Y) 

r-dgf(b,a).  (‘dgt(d,r).  unrrae  hable(  X  .Y )  —  -irf'arhabU  (  X  .Y ) 

K'l  —  rfa''hable{  a,W  1 )  .reachable(  c  .W ! ). 

Variable  substitution  of  the  integrity  constraint  produces  a  new  form  of  the  con¬ 
straint;  —  reachabh-l  A.W  t  ),rfa<hakle(('.W2).  A  =a.<'=r.W  I  =  iVS.  Mixed  partial  sub- 
sumption  of  K’l  with  the  recursive  rule  for  nnrenrhabie  jiroduces  the  following  set 

of  re.Sldue.s;  t  rrarhuble/ (\X2),X  =:a.C=:r.y~.\2..  ^  rearhahOg .A  .  I .).  X  =  .  A  =a,  )' =:  X  I 

1  Merging  the  residues  with  the  rule  produces  the  following  new  rule; 

ititrra<'h.able(  X  ,Y )  (  (->rf‘iirhablf’(  X  .Y ))  V  (rf'arhablf'(C.X2),X=n.('=:r  ,'i’=XV)  ^ 

(rfarhabh(A,Xl},A:=a,Xz:r,  Y=Xl)}. 

I'nfolding  the  semantically  compiled  rule  gives  three  new  rules; 

Hi  n7ireti>  hable(  X .  Y )  •—  -'iearhabte(X.Y). 

K2  U111  fiirhnblt( X .  Y )  —  i\X2),X  =  a.i  ’=r,  V=.V2. 

K3  nnrf  arhablf  (  X .  ^  i  ftit  hable(  A  ,  X ! ).X  =  •  ,  A  =ti,  >'=r.V  / . 


204 


K2  ami  ran  ln'  rcdmaal  tn  llii-  lnlluwiiif!;; 

uriT  > 'l-  ( <1.  )  '  •—  t ,  It.  {  ■  .  )' I  ?iTiT  '  .i>  h'll'I’ !  ■  .  )  —  j  i  > 

I  rum  Rl.  wliicli  cniitains  a  iit^aiivi' litoral,  llio  (pitTV  —  ur.r  .i  fi.ix.i  a  v  ,  l  ami'a 
lie  ausvviTcd  usliif!;  a  |)rii<'(‘duri'  such  as  SI.DM-'-rcsdlutiuii  siiicr  the  uiifiri  aiiid 
<|iit‘ry  —  ,  .V  t'i  lias  to  ho  .solvod.  Otto  |...,■->lhl)lI  v  is  to  oxtoiid  I  lio  (irool 

|irocodiiro  with  ci  uist  met  i  vo  iioj'at  loii.  Ilowovor.  to  iiso  const  met  ivo  nooai  n  ui .  >  'in 
must  earofully  oxtoml  SLDN  I -rosolut ion  to  lueorporaio  eonsi  ruet  iv  answtis 
Lvoii  tlioiigli  tlioro  is  a  fiiiito  iiiiiiihor  of  aiiswors  for  I  ho  ijiiory  —  r-.j  /on/o  V  ) 
a  siiiiido  application  of  .SLI)-rosolution  to  ohtain  tin-  answers  will  ri  sult  in  an 
intinito  eomputation  Ilowovor.  with  K'2'  and  proilucod  by  AlKorithiii  1.  two 
aiiswor  sulistitiifions  can  hi' coiiiputod.  j  A'/.i.t  /at  am)  {  .V/«.  1 /'  f  .Mon-ovor.  slip 
|ioso  the  integrity  const  raint  =—  o  i.i  »'  1 1  r.  1  ■  H  v  ‘  o  iM-i  it  '  it  ■ 

is  also  added  to  the  dataha.so.  I'heti,  after  the  compilation,  the  following  two 
rules  are  generated: 

H4  nni  >->1'  habh  ( X  y }  —  t  m*  h.ibU  ( ti.  .V  J  ^ ( <  .  >'.i 

Kr»  nni  ta>  h.xbh-  {  y  ,X  )  im>  h<ibi>^lii.  X  Lfni.  habl^(  -  .  V 'A 
From  these  two  rules  the  rest  of  tlie  atiswiTS  for  tiTiir.i.  ii.itu.  i  v  )  are  oluainod 

'The  following  t  lieorem  fortiially  establishes  t  he  correct  ness  of  .Mgorit  hiii  1 
Thoorttm  1.  id  ihf  tupU  l)H  =  (IDU.L'DIi.lC)  h.  u  didiirtni  ddlnhif.. .  b  t 
iUi  =  (l[)H  .L'DH.K'j  f/i'  d<i(ii/>itsi  d  af/(  r  ripplt/itit/  /  J.t  / 

Q  h(  a  qutry.  .-1  •.ubslitulKni  0  is  a  lonti  l  au.sin  i  sithstihiliini  far  DIbj  {((*[  if 
and  (inly  iff)  is  a  cornet  ansnt  r  substitution  for  PI)  U 

3.2  Ctmiparison  witli  Coiistriietivo  Nttgatioii 

When  a  tpiery  contains  a  negative  literal,  there  are  two  procedural  iiielliods 
for  expanding  the  search  space  of  the  negative  literal.  .Negalion-iis-linite-failiiro 
(.NAFF),  used  in  .SLD.N'F-resolutioti,  expatids  the  entire  search  tree  for  the  j'os- 
itive  afoiM  in  the  literal.  If  the  entire  tree  fails  (inite)y,  then  the  negative  atom 
can  be  considered  to  be  true.  NAFF  retpiires  the  tiegative  literal  to  be  ground 
and  for  the  positive  atom's  search  tree  to  be  linite.  Otherwise,  it  ran  return  no 
answers.  ( 'onstrtictive  negation  elimitiates  the  need  for  the  negative  literal  to  he 
grounded.  It  expatids  the  search  tree  of  the  literal’s  positive  atom  to  obtain  all 
the  answer  substitutions  for  tiie  variable's  in  tlie  atom  and  ro-istmrts  ati  expres¬ 
sion  formed  with  eepialities  and  ineepialities  that  states  values  that  the  variables 
in  the  literal  can  take  or  cannot  take.  Basically,  any  set  of  values  that  male  the 
atom  in  the  literal  true  are  disallowed  as  substitutions  for  the  negated  atom. 
( ‘onstructive  negation  still  has  the  limitation  that  the  search  tree  a.s.sociated 
with  the  jiositive  atom  must  be  linite.  Another  disadvantage  is  that  constructive 
negation  recpiires  all  the  jio.ssible  answers  for  the  positive  atom  before  it  can 
return  the  answer  for  the  negation  of  the  atom.  'Fliis  restriction  can  make  I  he 
process  very  long  and  extremely  complex,  especially  if  a  single  answer  siiflices 
the  user  posing  the  ipiery  to  the  database. 

As  in  constructive  negation.  Algorithm  1  enables  a  search  for  valiu's  of  vari¬ 
ables  in  nonground  negative  literals.  It  uses  crjiist raint  information  to  identify 
search  sjiace  that  cannot  co-occur  with  the  (piery's  negated  literal.  The  algorithm 


2U5 


|)r>ulii(fs  a  iii‘\v  M'liiaiitirally  (*(|mval<‘nl  (|UiTy  that  ruvi-rs  iIh-  ><  anli  ■^[lari'  ainl 
cniislrurts  siilist  It  lit  lulls  Ilf  alliiwiHl  valili's  fur  till-  varialili's  that  a|i|ifar  iii  tin 
tii'f^ativi'  literal.  I’tiliki'  I'utist  ritrl  iva  'If  of  siihstitiitiuiis  ulitaiinil 

itsitif;  tht'  alKiiritliiii  is  nut  iiirassarily  cuniiilita  llnwi'vtr,  t  lu'  tiiiswars  that  arr 
fuiitiil  an-  curri'i't  Fur  <'xatii|ih\  ronsidi'r  the  fulliuviiif^  i|iiit>.  'uu^lrailit  ainl 
rt'sidiie:  g  —  -.p|  V  v  ;p.s|Vi  ic  —  rss  v.z  ./p  v./'  h-mIu.  j/f,  \  i 
Siii)|  lose  flic  ri'Sidiia  Ht\  '/<  is  cxpatith'd  as  a  <|ii<'ry  and  |)rudii<is  lla  aiiswi  r 
snlistitiitiuiis  01  =  ja/Xi./Z|  and  s-j  =  (an/X.  i.l./Zi  'Flii-  cuiisl  raiiit  says  that  nu 
state  of  the  dataliase  will  rniitaiii  hoth  Hi.\  /'fH  and  /■,  v  >  '/.hi  ur  liuih  v  - 
and  it>s.  Tints,  any  hiiniings  that  are  ulitained  fur  v  thruii);|i  v  ran  h.- 

runratenati'd  with  the  n\  ;ind  »■>  to  prndiire  answer  siihsl  it  nt  imis  fur  the  ipnry 
Suppose  expansion  of  .s,  v,  prudiiet's  t  wo  siihst  it  lit  |uns.  |. /■>  !  ainl  /X  j  1  In  n 
the  final  set  of  answer  expressions  fur  llie  ipiery  is  the  folluwint;:  i  i/v  ,/)'  i /..'i 
|.ni/.V,  ■■/)'.  |,i/.\'.  ../>■  .  /Y.  l.h//\  Mureover.  in  a  l’ru|u>r-like  in¬ 

terpreter,  the  answers  are  /;iven  one  at  a  time,  and  if  only  one  answer  is  reipuri-d. 
the  roiiipnttitlon  process  can  he  stojiped  after  the  first  answer  is  ohiaiiied.  ()i  In  r 
answers  for  may  exist. 


4  Disjunctive  Databases  and  Constraints 

When  a  set  of  ititegrity  constraints  (( '  is  associated  with  .a  disjiiticlite  dalahase 
f’,  the  semantics  of  ( P  4-  l(')  rati  he  described  in  two  basic  ways.  We  cunsnh  r 
the  set  of  models  associated  with  i‘  .alone: 

1.  If  each  constraint  is  true  iii  etirh  model,  then  the  d:itah;i.se  is  considered  cuii- 
sistent  with  !('.  and  together  the  models  give  the  me.aning  of  i)je  dalah.i.'.e. 
Otherwise,  the  database  is  considered  inconsistent  with  IC,  and  the  d.alabase 
lias  no  meaning. 

2.  If  there  is  some  model  in  which  all  the  constraints  are  Inie.  then  the  dalab.ise 
is  consi'lered  to  b<>  consistent  with  IC.  The  set  of  models  in  which  all  cun- 
straints  are  true  describes  the  meaning  of  the  database.  Otherwise,  the 
database  is  considered  inconsistent  with  IC,  and  the  dattibase  has  no  ineaii- 
ing. 

For  definite  databases,  under  minimal  model  .semantics,  these  two  cases  ,tre 
erpiivalent  becatise  only  one  niodel  is  associated  with  a  databa.se:  its  minim.al 
model.  However,  tinder  minimal  t/iodel  .sematilics,  di.sjimctive  databases  m.ay  be 
inconsistent  with  IC  by  the  first  definition  yet  be  consistent  by  the  second  defi¬ 
nition.  This  is  possible  because  siich  databases  h.ave  multiple  minimal  models. 
File  following  di.sjimctive  databa.se  illnsi rales  this  point: 

IDHl  ' A  .  H )  W  (if*iibh-bonii(  A  .H)  ^  •arh<m(A).  tnnninl{  A  H ) 

hi>n(  It  ^'711 1 J .  KDFVJ:  hytho  t  yl(jt  r77i‘J) .  h>>7n(ft{(it'^77il.ittrnj/ 

The  rule  IDHl  says.  IJ  somi  i  iitilfi  A  is  a  rnrboii  atom  and  it  is  bomiift  to  souk 
otlii  r  i  iititi)  [i,  thi  n  till  bond  is  cithrr  a  rornlmt  bond  or  a  doiihh  bond.  I'he 
facts  indicate  that  u-rni  is  a  carbon  atom.  iKm/  'is  a  hyilroxyl  group,  and  the  two 
items  are  bonded.  Flie  database  has  two  minimal  models: 


206 


A/j  “  ^  ■  iij  it  ^  in  I  ' .  hyiit  .■!  yli  2t  t  mJ  I .  b-'ti,ttil{i(tfnl.itrtnJ:  it '  m  1  f  f  -  . 

M ,  =  \  •  111  h“tn  It  f  in  I  I .  hijiit  i>  I  yl‘ It '  tnL' i .  dt  it  t  ni  I  .it  ^  tuL'  ■‘•t(ilrti(^ti,>n'iiit-ii.:  it-fn.  ] 

Now  ronsiilcr  a  roust raiiit  that  says.  .\i>tlitug  tan  hi  a  liijdniji/l  tjitniji  ami  In 
doiihli-bondi  d  In  snnitlhing  i  Im  :  K  i  —  u^nthi,  J;.u,ti  \.  it .  inj.n  ■<  <  uU  H  1  la 

first  iiiiniiiial  model  vi(j|at<'s  tin- conslraiiit  wit li  t lie  siilist it nl ion  j  n.r,,/  i 

riiiis.  arrordiii);  to  the  iiitejrrity  const raiiit ,  the  iirst  imniiiial  moil.  1  should  he 
disregarded  wiien  answering  i|ueries  to  the  database 

lernandez  and  .\Iinker  have  defined  an  algorithm  that  tak's  a  clis|nnrtivt 
database  without  function  symliols,  com|iiites  its  minimal  models,  and  >  Imu 
nati's  the  tiiodelsthat  are  not  consistent  with  the  set  of  const  ramts.  .\nswers  to 
queries  are  computed  over  these  models  [FMftl],  An  alii  rnati\e  way  to  jrive  an 
swers  to  ipieries  that  are  consistent  witli  lln-  inlej;ril>  constraints  is  to  transform 
t  he  datfdiase  itito  a  new  dat  abase  t  hat  incorporates  t  he  intej;rit  \  const  raint  in  for 
imit ion.  If  all  of  the  minimal  models  of  the  new  dal aba.si-  are  consist enl  w  ith  tie 
constraints,  then  any  answers  obtained  ihroii)'h  any  disjunctive  ipe  ry  processing; 
mechanism  would  he  (diisislent  with  the  constraints  as  well 

We  modify  the  definition  of  mixeil  (larlial  subsumption  and  mi.xed  tie  ruin;;  m 
order  to  describe  a  procedure  thtil  performs  such  a  transformation  on  (le'i  le  c- 
esstirily  function-free)  disjunctive  dai.aba.ses  and  consir.ainis  without  ne;;alioii 
Dofiiiitioii  3.  Disjiiiictivo  siibstiiiiptioii 

I'ht  disjunctive  subsumption  nf  an  ariniii  It  =  .it  V  V  ,1;.  —  /f.. .  li,,,. 

hif  an  inltgrilg  iimslraint  I  =• — ( 'i . ( ,  ran  ixiui  if  and  eii/i/  if  linn  t  ml 

a  .1,  and  a  ( 'j  such  lhal  ling  unifg  uilli  niqn  0.  Tin  ilain't  (■ —  ( '| . ( : 

(  j  +  i . (‘ii)ll-  I'  III'  dispiuctive  residue  iihlainid  jnnn  li  and  I 

Dofiiiifioii  4.  Disjuiietivo 

'I'll  iinrgi  lln  disjnin  Itn  nsidin  ( —  ( 'i . fj.i.f  j^.| . ( ailh  lln  rah 

It.  71  plan  tin  mil  hg  lln  nt  te  r«/(  ( .4 1  V  ■  •  •  V  .1, _  |  V  .  1,  + 1  V  •  V  .1  »■  —  /f| . 

H,n  .  (  J  -  \  .  (  J  +  \ . ( 

Prui-mltiiai  1.  Soiiiantie  Compilation  in  Disjuiietivo  Databases 

Kipial  Un  fiilhnring  slips  aniil  nn  I  in  It '  disjuni  lin  snhsinin  s  ang  rah  an  /’ 

I.  Silirl  I  fiinn  It':  It  I  It'  hi  It'  {/). 

■J,  ill  r'  hi  /', 

I.  Wlnh  I’  /  0  do 

(a)  Tsi:  disjnni  liri  sahsuinplton  In  obtain  all  disjurn  In  i  ji  sniins  In  tu  n  n  I 
and  till  ariinns  in  /'  . 

(b)  Tsi  disjnin  lirt  nnrging  to  gi  niJati  tin  ni  ir  disjiiin  lin  nihs  XR. 

(rj  If  tin  iiiipig  rlausi  is  in  \R  rt  port  Hint  /’  is  inronsish  nl  irith  If'  and 
stop. 

(d)  Rtinori  larli  nth  irilli  an  iiiipl.g  linid  from  .\ R  and  add  it  to  It’. 

(i )  Li  t  r  hi  r  U  XR. 

(f)  Let  !•  In  XR. 

('(insider  our  disjunctive  deductive  databasi'  .about  a  carbon  atom  and  .a  hy¬ 
droxyl  Rroiii).  When  the  jirocediire  is  applied  to  K'l  and  FDM'i.  tin'  first  iti-ratioii 
|)roduc('s  I  he  following  intermedi;ite  constraint ;  ic'j  —  I  lu' 


207 


iiili  riiii'iiiatr  cdDst raiiil  is  aililfil  tn  iIh'  ilalaliasi-,  aii<i  lli»'  iirdcciliiri'  is  a|i)ilii  ij 
asaiii  Id  iirndiicc  a  ui  vv  IDB  ruli\  ll)H2,  Odiu  1(“2  aiul  IDHl 

ft .A  .  w rrnl? )  —  •  a»  6»»n/ A  it' ttiJ  , 

Aft'T  all  till-  iiit I'rart iims,  the  final  ifataliasi-  proilin-i'd  hy  the  |irDifiliir<’  cdiisisIs 
of  till’  follnwiii)'  mil's  aiul  facts  (nnti'  llit-  additinii  df  IDHd); 

IDHl  ‘  >  ni'ii  t-huj)  ,ii  A  H  ‘  V  A  .  h  —  '  at /"'Tif  .'t  (i(  A  li  ■ 

I  '  "t  >i/'  tj  A  if '  »n  -.*  /  «—  '  -it  .A  I  ./*';/(  .A  it >  tti^  > 

JPH.'i  ‘fi 'ih  A  .  H  >  V  ■t>>nhlr_h(<j)ilf  A  .  '  —  ■  a»  ^l'tl(  .A  l.  (•••tt'i' 'ii  A  .  hy  di i  yh.  h  ' 

KI  j  H  I  ‘  'll  hnii  {  it  •-  rn  I  I  Kl  )lil*  /j  yi<T  t  )y/f  1  f »  rtiij  j  Kl  )H3  f'f  tj  .<«  -if  J  f '  tfi  /  f  f '  Tfi  J  ' 

Til  <*  Ilf  *  VV  1.1<J  t  .il  )JlSr  h»lS  ( )|lf'  (1111)1111^1  II  li  !  {  .  at  /).»ti  ^  if  rtti  /  i .  /ly  lit 7 1//(  If '  Tii  J  I  ’  rai' d  i  if  - 

'nil  it'rnA/.  •  or  <il’  u  t  .h'‘nd{  it  •  in  I  .it '  ittV  }\  riif  iiiddi  l  is  cdiisisti'iii  with  K  'l 
Tht“or«'lii  2.  L'i  l‘  hr  a  pusiiii  t  dtsjujirltn  datahn.'ii  it  t  IC  hi  it  m  /  nf  uilujiihi 
(•(Uishntiils.  Lit  /"  hi  a  pusittve  disjiiiictiii  dalnhiiM  juudiirid  hij  I'tondun  I 
front  P  and  IC.  Thin  tin  Jollontnij  holds:  (I)  Anp  iiiinttiial  iiiodtl  of  P  that 
salt.sfiis  IC  IS'  a  tnodfl  for  I",  and  (J)  Any  tiitnniinl  ntodtl  of  /'  is  a  titodil  for 
P  U  K  ■. 

\VV  ari'  not  ahh'  td  jciiaranti'o  ti'riiiinalion  of  Procodiin'  1  since  it  applies  to  all 
disjiitif  t  i  ve  databases  and  I  he  (piestion  of  wth'lher  or  not  a  set  of  l( 's  is  cdnsisiiiil 
witli  a  set  of  disjmirlive  clauses  is  only  seini-decidahle.  However.  ITdceiinre  1 
is  a  step  toward  llie  e.'itensioii  of  semantic  compilation  to  disjunctive  ihi'ories 
\\e  believe  that  with  minor  modifications,  the  same  procedure  can  he  used  for 
more  jieneral  constraints  and  databases.  'Ihe  modifications  will  depi'nd  on  the 
semantics  chasen  for  nefjalion  iti  disjnnctiv<>  dataiia.ses. 


5  Conclusion 

( 'hakravartliy,  (irant ,  and  M  inker  [( '(IMffO]  gave  an  effect  ive  met  hod  to  opt  iriii/i- 
ipieries  to  definite  deductive  dataha.ses  for  cases  in  which  p-artial  siihsurnpt  ion 
involves  positive  atoms.  VVe  have  generalized  the  method  to  apply  to  negative 
literals  in  either  the  dataha.se  axioms  or  the  integrity  constraints  or  both.  We 
have  studied  all  possible  syntactic  interactions  lietween  the  literals  in  the  query 
and  the  literals  in  the  integrity  constraints;  positive-positive,  negative-negative 
and  negative-posit  ive. 

I'he  generalized  method,  called  semantic  compilation,  enables  answers  to  be 
found  for  (pieries  with  negative  literals  in  cases  for  which  constructive  negation 
or  .SLDNF  find  no  answers.  We  have  shown  that  the  answers  obt aim'd  from  a 
semantically  compiled  databa.se  are  correct  and,  in  scmie  cases,  coiuph  te 

When  tile  generalized  semantic  compilation  is  applied  to  <iisjunctive  databases 
that  are  inconsistent  with  the  integrity  constraints  for  some  minitnal  models,  a 
new  (lataba.se  is  produced  whose  minimal  models  ari'  all  consistent  with  the 
inti'grity  constraints  if  that  database  exists.  In  this  new  database,  qui-ries  can 
be  answered  using  any  procedure  for  disjunctive  databases  without  ri'ipiiring  a 
special  merlianism  to  check  the  constraints.  It  remains  to  bi'  seen  how  semantic 
compilation  can  Ix'  applied  to  disjunctive  databases  with  negation. 


208 


References 

[('(1M!I0]  I ’.S.  ( 'liakravartliy.  ,1.  (Irani,  and  .1.  Minktr.  I.Dgic  l>a.st<l  appruat  li  tu 
■scniantii  qiH^ry  optimization.  ACSt  Ihtiiiiirlintis  on  Dottibnsf  Synh 
.'(17.  .Iiine,  lO'Kl. 

[CliaH.a]  r.  ('liakravartliy.  St  nianttr  Qtirry  Ophtnizaltoti  in  Df  dnrtivt  hnhttnist  s. 

Pil  l),  thesis.  I'niv'.rsity  of  Maryland.  Depart inent  of  ('oiiipiitei  Siienee. 
College  Park,  198,'). 

[('Iia88]  1).  Clian.  Constructive  negation  based  on  the  roiiipleted  databases.  In 

R.  Kowalski  and  K.  .>\.  Bowen,  editors.  Proc.  Intt  rnationn!  Ponfi  r- 
(itct  (ind  Syniposinni  on  Logtr  Programnttny.  pages  111  IJ-').  .Seattle.  Wash¬ 
ington,  MIT  Press,  August  l.'i  19,  1988. 

[Cla78]  K.  L.  Clark.  Negation  as  failure.  In  H.dallaire  and  ,1.  Minker,  editors, 
Logic  and  Data  Hants,  pages  29.{  .122.  f’lenuiii  Press,  .New  N'ork,  1978. 

[KM9I]  .1,  A.  Fernandez  and  .Minker.  Bottom-iip  evaluation  on  disjunctive 
databases,  in  K.  Fiirukawa,  editor,  Proc.  International  ( 'onfcn  net  on  Logic 
Programming,  pages  (iCd  67.'),  Cambridge,  Massachusi  lts,  1991.  M  IT  Press. 

[(Iaa92]  T.  ( laasterland.  ( lent  rating  Coopirativc  Answtrs  in  Dtdiictivt  Datahant  n. 

Ph.n.  thesis.  I'niversity  of  Maryland,  Depart iiu  nt  of  Computer  Sciem  e. 
College  Park.  1992.  (Technical  Report  r.MiAC.S-TR-92- 11)7,  C,S-'I'|{-2968. 
I'niversity  of  Maryland,  October,  1992.) 

[(I(i[T92]  T.  ( laasterland,  .M.  (liuliano,  A.  Litcher,  N’.  Liu,  and  ./.  .Minker,  1  sing  in¬ 
tegrity  constraints  to  control  search  in  knowledge  base  systems.  To  appear, 
lournal  of  Fxpert  Systems. 

[( I( i.M ,N92]  T.  ( iaasterlaml,  P.  (lodfrey,  .1.  Minker,  and  L.  .N'ovik.  .A  cooperative  an¬ 
swering  system.  In  Andrei  Voronkov,  editor.  Procct dings  of  tin  Logic  Pro¬ 
gramming  and  .■\ntomat(d  Reasoning  Confcrcnct .  pages  101  120,  volume  2. 
Russia,  .Inly  1992. 

[(1M78]  H.dallaire  and  .1.  Minker,  editors.  Logic  and  Databasis.  Plenum  Press. 
.New  N’ork,  April  1978, 

[Kow79]  R.  Kowalski.  Logic  For  Problfn  Solving.  FIsevier  Science  Publishing  Co, 
Inc..  Oxford.  1 979. 

[Llo87]  .I.W.  Lloyd,  f  oundations  of  Logic  Programming.  .Springer  Verlag,  second 

edition,  1987. 

[Md90]  ,1.  Minker  and  A.  dal.  Producing  cooperative  answers  in  deductive 

dataliases.  In  I’.  Saint-Dizier  and  .S.  .Szpakowic.s,  editors.  Logic  and  Logic 
(irammar  for  Language  Processing.  L.S.  Horward,  Ltd..  1990. 

[Prz89b]  r.  C.  Przymusinski.  On  constructive  negation  in  logic  programming.  In 
F.L.  Lusk  and  R.A.  Overbeek,  editors,  Proc.  of  the  North  /Unt  rican  Con- 
ft  rence  of  Logic  Programming.  Cleveland,  Ohio.  October  16-21),  1989.  .Ad¬ 
dendum  to  F’roceedings. 

[Ri'i78]  R.  Reiter.  Deductive  cpiestion  answering  on  relational  databases.  In 
II.  dallaire  .1.  .Minker.  editor.  Logic  and  Data  Pasts,  pages  149  177.  Plenum 
Press,  .New  York.  1978, 

[RosH!)]  K,  A  Russ.  A  procedural  semantics  for  well  fottnded  negation  in  lugii  pro¬ 
grams.  In  Proceedings  of  the  Hth  ACM  SK ! ACT-SK! M t) D-S K i .A  H  !  Sympo¬ 
sium  on  Piinci]>li  of  Database  Systems.  Philadelphia,  Pennsylvania,  .AC.M 
Press,  March,  29-81,  198!). 


On  the  Interpretation  of 
Set-Oriented  Fuzzy  Quantified  Queries 
and  their  Evaluation 
in  a  Database  Management  System 

Patrick  BOSC  &  Ludovic  LIETARD  &  Olivier  PIVERT 

IRISA/ENSSAT 
BP  447 

Lannion  Cedex 
FRANCE 


Abstract.  Many  propositions  to  extend  database  management  systems 
have  been  made  in  the  last  decade.  Some  of  them  aim  at  the  support  of  a  wider 
range  of  queries  involving  fuzzy  predicates.  Unfortunately,  the  evaluation  of 
these  queries  is  computationally  complex  and  efficiency  is  considered  in  this 
paper.  We  focus  on  a  particular  subset  of  queries,  namely  those  using  fuzzy 
quantified  predicates.  More  precisely,  we  will  consider  the  case  where  such 
predicates  apply  to  sets  of  elements.  Based  on  some  interesting  properties 
of  a-cuts  of  fuzzy  sets,  we  are  able  to  show  that  the  evaluation  of  these 
queries  can  be  significantly  improved  with  respect  to  a  naive  strategy 
performing  exliaustive  scans  of  sets. 

1  Introduction 

The  database  management  systems  currently  available  are  based  on  the  relational  model 
and  they  suffer  several  limitations  regarding  user  or  application  needs.  In  particular,  it 
is  assumed  that  data  are  precisely  known  (or  fully  unknown)  and  queries  are  based  on 
crisp  conditions.  The  notion  of  imprecision  can  be  introduced  in  such  systems  at  two 
levels:  for  representing  imprecise  or  uncertain  data  and  to  allow  flexible  queries.  In  this 
paper,  we  will  only  consider  the  second  aspect,  that  is  to  say  that  the  data  are  assumed 
to  take  their  values  in  ordinary  universes,  whereas  queries  may  contain  imprecise 
conditions. 

In  the  context  of  an  extended  relational  language  supporting  imprecise  querying 
such  as  SQLf  [1, 2],  queries  are  viewed  as  fuzzy  predicates.  The  query  is  associated  to  a 
threshold  a  and  the  retrieved  data  are  the  elements  of  the  a-cut.  In  such  a  language,  it 
seems  natural  to  compose  fuzzy  predicates  and  to  introduce  fuzzy  quantifiers  inside 
queries.  Various  kinds  of  compound  fuzzy  predicates  have  been  proposed  in  recent 
years  [3,  6].  Base  predicates  described  as  fuzzy  sets  (i.e.  by  means  of  characteristic 
Inunctions)  can  be  altered  by  linguistic  modifiers  and  arranged  together  using  connectors 
or  aggregates  in  order  to  reach  the  appropriate  semantics.  Fuzzy  quantifiers  were  first 
introduced  by  L.A.  Zadeh  [8]  to  generalize  the  existential  (3)  and  universal  (V) 
quantifiers.  Few  years  after,  R.  Yager  [5, 6. 7]  suggested  another  definition  for  fuzzy 
quantifiers  and  a  new  approach,  based  on  possibility  theory,  was  also  proposed  by 
H.  Prade  [4].  In  this  paper,  we  deal  with  the  evaluation  of  fuzzy  quantified  predicates 
which  concern  sets  of  tuples.  More  precisely,  our  aim  is  to  point  out  some  efficient 
strategies  for  the  evaluation  of  fuzzy  quantified  predicates  (according  to  Zadeh’s  and 
Yager's  interpretations  only),  since  efficiency  is  a  key  point  in  DBMS's. 


210 


In  section  2,  fuzzy  quantified  predicates  are  introduced  along  with  their 
interpretation  according  to  ^deh  and  Yager  and  their  evaluation  is  presented  in  section 
3.  To  conclude,  we  summarize  the  main  results  and  draw  some  directions  for  future 
works. 

2  The  Quantification  of  Set-Oriented  Fuzzy  Predicates 

2.1  Set-Oriented  Fuzzy  Predicates 

In  the  usual  relational  framework  it  is  possible  to  distinguish  the  predicates  applying 
to  individual  elements  (x's)  of  a  set  X  and  the  predicates  whose  argument  is  a  set  X  of 
elements.  Two  typical  examples  of  these  categories  in  an  SQL  language  are  (over  the 
relation  UNIVERSFTYfteacher,  department,  salary)); 

"select  teacher  from  UNIVERSITY  where  salary  >  4000" 

and  "select  department/rom  UNIVERSITY 

group  by  department  having  avg( salary)  >  4000". 

It  is  possible  to  extend  this  notion  to  fuzzy  predicates  and  to  define  a  set-oriented 
fuzzy  predicate  which  gives  to  set  X  a  value  r^ed  in  [0,1].  Thus,  in  this  paper,  we 
will  concentrate  on  the  evaluation  of  this  type  of  query: 

"select  ...from  R  group  by  Y  having  Q  are  A" 

where  Q  is  a  fuzzy  quantifier,Y  an  attribute  of  the  relation  R  and  A  a  set-oriented  fuzzy 
predicate.  Mainly,  the  topic  of  this  article  will  be  the  expression  and  evaluation  of  the 
predicate  "Q  x's  are  A”  where  the  x's  are  the  elements  of  a  usual  set  X.  An  example, 
over  the  same  relation  UNIVERSITY,  could  be: 

"select  a  department /rom  UNIVERSITY 
group  by  department  having  many  are  well-paid". 

This  query  corresponds  to  the  sentence  "find  the  departments,  satisfying  many  of  their 
teachers  are  well-paid,  with  an  overall  degree  over  a".  Thus,  for  each  department,  we 
need  to  evaluate  the  set-oriented  predicate  "many  teachers  are  well-paid"  to  determine  if 
its  value  is  over  a. 

2.2  Fuzzy  Quantifiers  According  to  Zadeh 

Zadeh  [8]  distinguishes  the  absolute  quantifiers  and  the  relative  ones.  The  absolute 
quantifiers  (about  3,  at  least  a  dozen)  are  defined  on  a  number  (the  absolute  fuzzy 
cardinality)  while  the  relative  quantifiers  (about  one  half,  almost  all)  are  defined  on  a 
proportion  (the  relative  fuzzy  cardinality).  Thus,  an  absolute  quantifier  is  represented 
by  a  function  Q  from  [0,n]  to  [0,1]  whereas  a  relative  quantifier  is  represented  by  a 
function  Q  from  [0,1]  to  [0,1].  In  both  cases  the  value  Q(j)  is  the  satisfaction  the  user 
gives  to  the  quantification  if  j  criteria  are  satisfied. 

If  Qa  stands  for  an  absolute  quantifier,  the  set  oriented  predicate  "Qa  x's  are  A"  is 
then  interpreted  according  to  the  f(»muia  Qa(2IxItA(*))  where  the  x's  are  the  elements 
of  the  set  X.  If  Qj-  stands  for  a  relative  quantifier,  the  set  oriented  predicate  "Qr  x's  are 


211 


A"  is  interpreted  according  to  the  formula  Qr{Sx^A(x)/i)  where  n  is  the  usual 
cardinality  of  the  set  X. 

Let's  consider  the  absolute  quantification  "About  3  x's  are  A"  and  the  relative 
quantification  "At  least  half  x's  are  A"; 


1 


®  ^  ^  ^  ^  ^  number  of  q  05  j  proportion 

elements  in  X 

Fig.  1.  The  absolute  quantifier:  "About  3"  Fig.  2.  The  relative  quantifier:  "At  least  half' 

The  fuzzy  set  {0.5/xi,  O.8/X2,  I/X3}  satisfies  "About  3  x's  are  A"  with  a  degree 
Qa(2.3)  =  0.3.  It  satisfies  the  predicate  "At  least  half  x's  are  A"  with  a  degree  Qr(0.76) 
=  1. 

The  interpretation  of  an  absolute  quantifier  seems  very  difficult  to  apply  in  all 
cases.  This  is  due  to  the  fuzzy  cardinality  which  may  be  equi  for  two  sets,  while  these 
two  sets  are  very  different.  As  an  example  the  set  {O.l/x],  O.I/X2. ....  O.I/X30)  is  as 
"About  3"  as  the  set  {1/xi,  I/X2,  I/X3}  but,  actually,  according  to  the  human 
appreciation  of  "About  3",  they  are  very  different.  Thus,  this  paper  will  only  deal  with 
the  evaluation  of  relative  quantifiers  according  to  Zadeh's  interpretation. 

2.3  Fuzzy  Quantifiers  According  to  Yager 

Yager  [5,  6,  7]  suggests  we  represent  proportional  (equivalent  to  increasing 
monotonic)  quantifiers  by  means  of  an  Ordered  Weighted  Averaging  aggregation 
(OWA).  First  of  all  let  us  recall  the  definition  of  an  OWA  operator : 

n 

OWA(wi, ...  ,Wn,xi, ...,  Xn)  =  ^ (Wi.*Xfc.)  whcrc  X|(,  is  the  i*  largest  value  among  the 

i=l 

Xk'S. 

A  proportional  quantifier  Q  is  defined  as  :  Q(0)  =  0,  3k  such  that  Q(k)  =  1  and  V 
a,b  if  a  >  b  then  Q(a)  >  Q(b).  The  interpretation  of  Yager  is  to  compute  the  weigths 
Wj's  of  the  aggregation  from  the  function  Q  describing  the  quantifier.  In  case  of  an 
absolute  proportional  quantifier  W;  =  Q(i)  -  Q(i-l)  and  in  case  of  a  relative  proportional 
one,  assuming  n  is  the  crisp  cardinality  of  X,  Wi=  Q(i/n)  -  Q((i-l)/n). 

Let's  consider  the  relative  proportional  quantifier  "At  least  half  ”  of  figure  2.  The 
fuzzy  set  {0.5/xi,  O.8/X2, 0.7/X3}  satisfies  "At  least  half  x's  are  A"  with  a  degree  given 
by  OWA(wi,  W2,  W3, 0.5, 0.8, 0.7).  We  compute: 


wi  =  Qr(  l/3)-Qr(0)  =  2/3  W2  =  Qr(2/3)-Qr(  1/3)  =  1/3  wj  =  Qr(  l)-Qr(2/3)  =  0 
OWA(wi,  W2.  W3,  0.5,  0.8,  0.7)  =  (2  /3)  *  0.8  +  (1/3)*  0.7  =  0.76 


3  Evaluation  of  Requests 

3.1  Principle 


212 


Here  we  propose  algorithms  which  allow  us  to  determine  whether,  for  a  set  X 
resulting  from  a  partitioning,  the  value  of  truth  of  the  predicate  "Q  x's  are  A"  is  greater 
than  a  given  threshold  a  (degree  of  satisfaction).  We  will  assume  that:  i)  the  number  n 
of  elements  of  each  set  is  known  (this  could  be  easily  added  in  a  usual  index  structure 
or  implemented  in  a  sort  algorithm),  ii)  each  set  can  be  accessed  separately  but  step  by 
step  as  tuples  are  required.  Our  aim  is  to  replace  algorithms  in  0(n)  (based  on  naive 
strategies)  by  algorithms  in  0(n)  (based  on  improved  strategies).  Therefore,  the 
objective  is  to  reduce  the  access  to  tuples  and  by  doing  this,  to  indicate  those 
conditions  deciding  whether  the  calculus  should  continue  or  can  stop.  More  precisely, 
the  calculus  (mainly  the  loop  wich  encompasses  data  access)  can  stop  in  two 
circumstances;  i)  when  the  set  cannot  reach  the  desired  level  (a),  ii)  when  it  is  certain 
that  the  set  will  reach  the  desired  level  (a)  and  the  precise  value  of  the  membership 
degree  is  not  required.  This  reasoning  is  very  similar  to  what  is  done  in  the  design  of 
"try  and  error"  or  "branch  and  bound"  algorithms  where  some  heuristics  is  searched  in 
order  to  limit  the  search  tree,  i.  e.  the  number  of  candidates  to  be  examined. 

3.2  Evaluation  of  Relative  Quantifiers  According  to  Zadeh 

According  to  Zadeh,  the  evaluation  of  the  set-oriented  predicate  "Qr  x’s  are  A"  (Qr 
being  a  relative  quantifier)  for  a  set  X={X|,  ....  Xn}  is  Qr(SxftA(x)/n).  By  giving 
ourselves  a  threshold  a,  we  revert  to  determining  the  sets  X  satisfying  Qr(SxltA(*)/n) 
>  a.  According  to  the  curve  (cf  fig.  3)  representing  the  quantifier  Q^,  we  can  consider 

that  our  problem  is  turned  into  the  characterization  of  the  sets  X=lxi . Xn) 

satisfying  a  <  Ixf^Af^j/n  S  b  (1) 


QrdxliAfJiynl^a  <=>  a<XxliA(x)/nSh 

Fig. 3.  Ana-cut 

Naive  Algorithm.  The  naive  algorithm  consists  in  carrying  out  an  exhaustive  scan 

k 

of  the  set  X={xi, ...,  Xn).  Let  us  write  S|(=  X()tA(Xj)/n).  We  are  seeking  to  obtain  and 

i=l 


then  test  the  value  Sp. 


213 


So:=0; 

for  i;=l  (o  n  do 
Si:=Si.i+|iA(Xi)/n; 
encfor, 
ifS„e  [a.b] 

then  the  set  satisfies  the  predicate  with  a  degree  greater  than  (or  equal  to)  a 

else  the  set  satisfies  the  predicate  with  a  degree  smaller  than  a; 

endif 

Improved  Algorithm.  The  idea  is  to  modify  the  naive  algorithm  using  heuristics 
which  allow  us  to  predict  whether  or  not  the  set  will  confirm  the  double  inequation 
(1).  Let  k  6  [0,  n-1],  we  have: 

n 

Sn-Sk=  X(MA(Xi)/n). 
i=k+l 


Now  Vi,  pAfXi)  G  [0,1],  wich  implies  that,  0  <  Sn  -  <  (n-k)/n.  Thus  we  can  write: 

Vk  6  [0,n-l]  S|(  <  S„  <  Si(  +  (n-k)/n. 

Therefore,  we  can  conclude  that  there  are  two  heuristics  to  stop  the  algorithm  at 
step  k  of  the  iteration: 

If  Sk  G  [a,b]  and  if  +  (n-k)/n  e  [a,bl  then  it  is  certain  that  Sn  e  [a.b]  .Therefore,  we 
can  stop  the  iteration  because  the  set  satisfies  the  predicate  with  a  degree  greater  than 
or  equal  to  the  threshold  (it  is  a  success). 

If  Sk  >  b  or  Sk  +  (n-k)/n  <  a  then  it  is  certain  that  Sn  «  [a.b].  Therefore,  we  can  stop 
the  iteration  because  the  set  satisfies  the  predicate  with  a  degree  smaller  than  the 
threshold  (it  is  a  failure).The  improved  algorithm,  therefore,  is  : 

So:=  0: 

for  k  :=  1  to  n  do 
Sk  :=  Sk-i  +  HA(Xk)/n: 

//Sk  >  b  or  Sk  +  (n-k)/n  <  a  then  exit  failure  endif. 

//Sk  G  [a.b]  and  Sk  +  (n-k)/n  g  [a,b]  then  exit  success  endif, 
enc^or; 

exit  success:  the  set  satisfies  the  predicate  with  a  degree  greater  than  (or  equal  to)  a. 
exit  failure:  the  set  satisfies  the  predicate  with  a  degree  smaller  than  a. 

3.3  OWA  Based  Evaluation  of  Quantifications 

We  present,  for  a  considered  set  X,  the  OWA  based  evaluation  of  the  predicate  "Q  x’s 
are  A",  assuming  that  Q  is  a  monotonic  quantifier.  First  we  present  a  naive  algorithm 
to  compute  the  OWA  aggregation.  Then,  we  recall  the  properties  of  an  OWA.  These 
properties  will  be  used  to  specify  an  improved  algorithm  from  the  naive  one. 

Naive  Algorithm.  The  naive  algorithm  performing  the  calculus  of  the  OWA 
aggregation  is  based  on  the  exhaustive  scan  of  the  set.  It  is  assumed  that  W  is  the 


I 


214 


vector  that  describe  the  weight  of  the  OWA  (the  W(i]  are  computed  from  Q),  the  x,'s 
are  the  element  of  the  set  X  and  GV  is  the  result  of  the  aggregation. 

for  each  Xj  in  X  do  V[i]  =  Pa(’^i)  ^ndfor; 
order  the  vector  V  giving  V; 
compute  the  vector  W; 

comment  this  calculus  only  needs  the  knowledge  of  n  the  cardinality  of  X 
endcomment, 

GV  =  0; 

for  i  from  \  ton  do  GV  =  GV  +  V'[i]  *  W(i]  endfor, 

i/GV  >  a  then  the  set  X  satisfies  the  quantification  with  a  degree  of  GV  greater  than 
or  equal  to  a  endif; 

Some  Properties  of  the  OWA  Operator.  The  OWA  is  a  mean  operator  and  so 
it  has  interesting  properties: 


OWA(wi,  . 

.  .  w„,  Xj, 

..  ,  Xj . Xn)  <  OWA(Wj . W„,  X,,  ...  ,  Xj.  1,  . 

..1) 

(2) 

OWA(w,, . 

.  ,  w„,  Xj. 

..  ,  Xj . x„)<  OWA(w, . Wn.  1,  1 . Xj,  1.  .. 

.  .1) 

(3) 

OWA(wi,  . 

.  .  Wn,  Xj, 

..  .  Xj . x„)  >  OWA(w, . Wn,  Xj . X,.  0.  . 

..0) 

(4) 

which  arise  from:  i)  the  monotonicity  of  any  mean  operator,  ii)  the  fact  that  x  belongs 
to  [0,1].  From  these  basic  properties,  one  can  derive  conditions  bearing  on  the  xj’s 
which  are  necessary  for  the  satisfaction  of  the  condition: 

OWA(Wi,  ...  ,  Wn.  Xj . Xi,  ...  ,  Xn)  >  O. 

From  (3),  one  can  derive: 

OWA(Wi,  ...  ,  Wn,  Xi . X|,  ...  ,  Xn)  >  a 

n-1 

=>  OWA(Wi  ...  ,  Wn,  1,...,  l,Xi,  1,  ...  ,1)  >  a  <=>  (  X  Wj  *  1)  +  Wn  *  X|  >  O 

i=l 

<=>  (1  -  Wn)  +  Wn  *  Xi  >  a 


and  finally,  we  get: 

OWA(Wi,  ...  ,  Wn,  Xi,  ...  ,  X;.  ...  ,  Xn)  >  a  =>  Vi,  Xj  > 


(5). 


This  last  formula  is  valid  only  if  w„  is  strictly  positive,  otherwise  no  implication 
can  be  found.  Moreover,  it  is  only  profitable  if  (a  +  Wn)  >  1  (otherwise,  we  have  a 
condition  wich  is  trivially  satisfied).  From  (2)  and  (4),  we  have: 


OWA(wi, ...  .  Wn.  Xj, ...  ,  X|,  1, ...  ,1)  <  a  =>  OWA(wi, 


,  w„ 


.  Xn)  <  a  (b) 


OWA(wi. 


Wn.Xj,  .  ,  Xj,  0,  .  ,0)  >  a  =>  OWA(Wi,  .  ,  Wn,  Xj,.,  Xj,  ..  ,  Xn)  >  a 


(7) 


These  last  two  conditions  will  be  used  for  partial  evaluations  of  an  OWA  aggregation 
as  we  will  see  in  the  next  section. 


215 


Improved  Algorithm.  Since  n  is  known,  the  w,  can  be  calculated  and  in  particular 
the  last  value  Wn.  Thus,  if  the  sum  (a  +  Wn)  is  greater  than  1,  we  can  apply  the 
condition  (5)  to  any  tuple  Xj  of  a  set  and  insert  the  following  instruction; 

(/IAa(’^i)  <  — -then  exit  failure  endif; 

From  a  practical  point  of  view,  sets  with  a  large  number  of  elements  will  lead  to  a 
low  value  for  w„  and  consequently  this  condition  will  not  work  frequently  (unless  if  a 
is  very  close  to  1).  Now,  let  us  assume  that  we  have  already  accessed  k  tuples  of  a  set 
(tuples  xj  to  Xfc),  and  the  values  Vj  =  p^fx^)  for  i  e  (1.  k]  are  known.  If  we  assume  that 
the  (n-k)  missing  values  are  1  and  the  result  of  the  OWA  aggregation  remains  smaller 
than  a,  according  to  formula  (6)  we  can  be  certain  that  this  set  will  never  reach  the 
desired  level  a.  We  have  to  determine  the  aggregation  (we  recall  that  v^  is  the  i'*’ 
largest  value  of  the  Vj's): 

n-k  k 

OWA(Wi, ...  .  w„,  V,. ...  ,vi„l . 1  )  =  Iwj -t  y  (w„.k*i  *  Vj_) 

'=*  i=l 

This  computation  requires  only  that  the  values  V{1]  to  V[k]  are  sorted.  In  addition,  the 
expression  does  not  have  to  be  calculated  from  scratch  from  step  k  to  step  (k  +  1) 
since,  for  the  first  part,  it  is  enough  to  subtract  w„.k.  Thus,  once  again,  we  can  specify 
a  condition  likely  to  stop  the  outer  loop; 

insert  PA(^k)  into  V[i;k];  comment  V[i;k]  in  decreasing  order  endcommeni: 
n-k  k 

compute  A  =  X  W[i]  +  X(W[n-k+i]  *  V[i]); 
i=l  i=l 

//  A  <  a  then  exit  failure  endif: 

When  k  =  n  (last  tuple  of  the  set),  the  value  of  A  equals  precisely  the  value  GV  which 
is  the  degree  tied  to  the  set. 

Finally,  if  the  value  of  the  membership  degree  of  the  set  is  not  necessary,  we  can 
take  advantage  of  formula  (7).  In  fact,  this  formula  states  that  if  we  have  already 
accessed  k  tuples  of  a  set  (tuples  Xi  to  x^)  and  we  assume  that  the  (n-k)  missing  values 
equal  0  and  the  result  of  the  OWA  aggregation  already  exceeds  a.  then  we  can  be  sure 
that  the  set  will  reach  the  desired  level  a.  We  have  to  determine; 

k 

OWA(Wi,  ...  ,  Wn.  Vi . V|j,0 . 0)  =  ^(Wj  *  V. ) 

i=l 

Here  again,  we  only  have  to  sort  values  V(i]  to  V(k]  and  we  can  specify  a  condition 
likely  to  stop  the  outer  loop; 


insert  |iA(Xk)  into  V[i;k];  comment  V(i:k]  in  decreasing  order  endcomment: 
k 

compute  B  =  X(W[i]  *  V[i]); 

1=1 

ifB>a  then  exit  success  endif; 

Again,  when  k  =  n,  the  value  of  B  equals  that  of  A  and  GV  is  the  membership 
degree  of  the  current  set.  We  can  now  give  the  final  algorithm,  when  n  the  number  of 
tuples  of  any  set  is  known  in  advance: 

compute  the  vector  W;  comment  W[i]  =  W;  endcomment: 
for  each  in  X  do 

'/ftA(Xk)  <  ^  ^  failure  endif: 

insert  |iA(Xk)  into  V[i.k];  comment  V[i:k]  in  decreasing  order  endcomment; 
n-k  k 

A  =  XW[i]  +  5:(W[n-k+i]  *  V[i]): 
i=l  i=l 

if  A<a  then  exit  failure  endif: 
k 

B  =  I(W[i]  *  V[il): 
i=l 

r/B  >  a  then  exit  success  endif; 
enddo: 

exit  success;  the  set  X  satisfies  the  quantification  with  a  degree  greater  than  (or  equal 
to)  a. 

exit  failure:  the  quantification  will  never  reach  the  threshold  a. 

3.4  An  Example 

Let  us  consider  the  query  :  "find  the  departments  where  most  of  the  employees  are 
well-paid,  is  satisfied  with  a  degree  greater  than  0.73"  which  is  expressed  in  SQLf  as: 
select  0.73  dep  from  EMPLOYEE  group  by  dep  having  most  are  well-paid.  We 
examine  a  department  (set  of  tuples  sharing  the  same  value  for  the  attribute 
department)  containing  five  employees  el  to  e5  with  the  following  characteristics: 


emp 

dep 

salary 

el 

d 

38000 

e2 

d 

55000 

e3 

d 

46000 

e4 

d 

32000 

e5 

d 

48000 

Fig.  4.  A  department 


Fig.  5.  A  fuzzy  predicate:  well-paid 


217 


"Most"  is  represented  by  the  function  :  x  ^  .  "well-paid"  is  the  membership 

function  given  in  figure  5.  So  the  fuzzy  set  well-paid  is  (0.4/el.  l/e2, 0.8/e3,  0.1/e4, 
0.9/e51. 

Evaluation  According  to  Zadeh's  Interpretation  of  Quantifiers.  "Most" 
is  represented  by  the  function  :  x  x^  thus  a  =  0.85  and  b  =  1.  (  because  x  e  [0.1] 
and  x^  >  0.73  is  equivalent  to  x  6  [0.85.1]).  If  we  perform  the  overall  calculus 
presented  by  Zadeh  we  get  [(0.4  -t-  1  +  0.8  +  0.1  +  0.9)/5]^  =  0.4;  therefore  snis  set 
does  not  match  our  requirement  (0.73).  Now.  let  us  apply  the  improved  algorithm 
presented  in  3.2  assuming  that  the  tuples  are  accessed  according  to  the  order  depicted 
above: 

Access  to  employee  el  :  M.u'«ii-paid(el)=0.4  =>  Si=0.08  .  Si-*-(4/5)  =  0.88  =>  the  loop 
goes  on 

Access  to  employee  e2  :  Pweu-p>id(e2)=l  =>  S2=0.28  .  S2-t-(3/5)  =  0.88  =>  the  loop 
goes  on 

Access  to  employee  e3  :  |iweu-paid(e3)=0.8  =>  53=0.44.  S3+(2/5)  =  0.84  =>  the  hop 
stops  because  5j  +  2/5  <  0.85. 

In  this  case  we  save  2  accesses.  Moreover,  if  e4  were  the  first  tuple  of  the 
considered  set,  the  loop  would  have  stopped  immediately  because  (0. 1/5  +  4/5)  =  0.82 
is  less  than  0.85. 

Evaluation  According  to  Yager's  Interpretation  of  Quantifiers.  The 

weight  vector  W  is:  W[l]  =  0.04,  W[2]  =  0.12,  W(3]  =  0.2,  W[4]  =  0.28,  W[5]  = 
0.36.  If  we  perform  the  overall  calculus  for  these  data  (naive  strategy  requiring  the 
access  to  the  5  tuples),  we  get:  (0.04  *  1)  +  (0.12  *  0.9)  -i-  (0.2  *  0.8)  -t-  (0.28  *  0.4) 
+  (0.36  *  0.1)  =  0.45;  therefore  this  set  does  not  match  our  requirement  (0.73).  Now. 
let  us  apply  the  improved  algorithm  presented  in  3.3  assuming  that  the  tuples  are 
accessed  according  to  the  order  depicted  above.  Since  (a  +  W[5])  is  greater  than  1 .  ( (a 
+  W[n]  -  1)/W[n])  =  0.25  ),  the  first  condition  of  the  algorithm  is  interesting  (not 
trivially  satisfied). 

Access  to  employee  el:  Pweu-paidfel)  =  0.4  >  0.25;  A  =  0.78  >  0.73;  B  =  O.OI  <  0.73 
=>  the  loop  goes  on 

Access  to  employee  e2;  liweu-paid(e2)  =  1  >  0.25;  A  =  0.78  >  0.73;  B  =  0.08  <  0.73  => 
the  loop  goes  on 

Access  to  employee  e3;  Pweii-p,id(e3)  =  0.8  >  0.25;  A  =  0.72  >  0.73  is  false  =>  the 
loop  stops  here 

In  this  case,  we  save  also  2  accesses  and  if  e4  were  the  first  tuple  of  the  considered 
set,  the  loop  would  also  have  stopped  immediatly,  since  Hweii.paid(e4)  =  0.1  is  under 
0.25  and  4  data  accesses  would  have  been  saved. 

4  Conclusion 

In  this  paper  we  have  dealt  with  database  management  systems  which  support 
imprecise  queries  and  in  which  conventional  data  are  stored.  More  precisely,  we  have 
concentrated  on  a  single  type  of  fuzzy  queries  involving  quantifiers  and  set-oriented 


218 


predicates.  The  evaluation  of  such  a  query  leads  to  apply  sequentially  a  predicate  "Q 
elements  of  X  are  A"  (assuming  Q  is  a  quantifier  and  A  a  fuzzy  predicate)  to  several 
sets  X  resulting  from  a  partitioning.  We  were  interested  in  two  interpretations  of 
quantifiers.  Yager's  interpretation  which  can  apply  only  to  proportional  (i.  e. 
monotononic  increasing)  quantifiers  and  Zadeh's  interpretation  which  was  restricted  to 
relative  quantifiers.  Both  interpretations  use  a  mean  operator  to  accomplish  their 
calculus. 

We  have  designed  some  strategies  for  the  evaluation  of  the  considered  quantified 
set-oriented  fuzzy  queries  when  a  threshold  for  the  degree  of  satisfaction  is  given  by  the 
user.  Starting  from  a  naive  strategy  based  on  the  exhaustive  scan  of  a  considered  set. 
we  have  pointed  out  some  properties  of  the  quantifiers  interpretations  allowing  for 
some  improvements,  especially  regarding  data  access.  Where  the  number  of  elements 
of  the  considered  set  is  known,  we  have  shown  that  conditions  could  apply  to  each 
element  of  the  set  to  decide  whether  or  not  the  calculus  had  to  be  continued.  These 
conditions  are  based  on  the  mean  operators'  property  of  monotonicity  which  give  some 
heuristics  to  determine  whether  or  not  the  set  would  reach  the  specified  threshold.  In 
the  near  future,  we  will  perform  some  simulations  in  order  to  get  an  idea  about  the 
benefit  given  by  our  improvements. 

References 

1.  P.  Bose,  M.  Galibourg,  G.  Hamon;  Fuzzy  querying  with  SQL;  extension  and 
implementation  aspects.  Fuzzy  Sets  and  Systems  28.  333-349  (1988) 

2.  P.  Bose,  O.  Pivert;  About  equivalences  in  SQL^,  a  relational  language  supporting 
imprecise  querying.  Proc.  International  Fuzzy  Engineering  Symposium. 
Yokohama  (Japan):  309-320  (1991) 

3.  D.  Dubois,  H.  Prade;  A  review  of  fuzzy  set  aggregation  connectives.  Information 
Sciences  36,  85-121  (1985) 

4.  H.  Prade;  A  Two-Layer  Fuzzy  Pattern  Matching  Procedure  for  the  Evaluation  of 
Conditions  Involving  Vague  Quantifiers.  Journal  of  Intelligent  and  Robotic 
Systems  3, 93-101  (1990) 

5.  R.R.  Yager;  On  ordered  weighted  averaging  aggregation  operators  in  multicriteria 
decisionmaking,  IEEE  Transactions  on  systems,  Man  and  Cybernetics  18,  183-190 
(1988) 

6.  R.R.  Yager;  Connectives  and  quantifiers  in  fuzzy  sets.  Fuzzy  Sets  and  Systems  40, 
39-75  (1991) 

7.  R.R.  Yager;  Fuzzy  quotient  operators  for  fuzzy  relational  databases,  Proc. 
International  Fuzzy  Engineering  Symposium,  Yokohama  (Japan):  289-296  (1991) 

8.  L.A.  Zadeh;  A  computational  approach  to  fuzzy  quantifiers  in  natural  languages. 
Computer  Mathematics  with  Applications  9.  149-183  (1983) 


Methodologies  for 

Knowledge-Based  Software  Engineering 

Michael  R.  Lowry 


A1  Research  Branch,  MS  269-2 
NASA  Ames  Research  Center  /  Recom  Technologies 
Moffett  Field,  CA  94035 


Abstract,  As  the  science  of  knowledge  representation  and  automated  reasoning 
advances,  AI  has  the  potential  to  radically  change  the  artifacts,  methodologies,  and 
life  cycles  of  software  engineering.  The  most  significant  change  will  be  when 
problems  are  formalized  at  the  level  of  specifications  rather  than  programs.  This 
will  greatly  facilitate  software  reuse  and  modiftcation.  Achieving  this  potential 
requires  overcoming  many  technical  challenges,  particularly  the  semi -automated 
synthesis  of  efficient  and  correct  programs  from  specifications.  The  first  part  of 
this  paper  describes  several  methodologies  for  program  synthesis  and  compares 
their  ability  to  control  the  combinatorial  explosion  inherent  in  automated  reason¬ 
ing. 

As  knowledge -based  software  engineering  matures  and  increasingly  auto¬ 
mates  the  software  engineering  life  cycle,  software  engineering  resources  will  shift 
toward  knowledge  acquisition  and  the  automated  reuse  of  expert  knowledge  for 
developing  software  artifacts.  The  second  part  of  this  paper  describes  methodolo¬ 
gies  for  expanding  the  software  life  cycle  to  the  knowledge  life  cycle. 


1  Introduction 

In  the  early  sixties  large  software  projects,  such  as  those  undcr'.xkcn  fcr  SA's 
program,  forced  software  engineering  to  mature  from  an  ad  hoc  endeavor  practiced  by 
small  teams  of  programmers  to  a  structured  engineering  discipline.  Structured  program¬ 
ming  methodologies  were  developed  to  cope  with  the  complexities  of  managing, 
specifying,  designing,  and  implementing  large  software  systems.  Structured  designs 
were  captured  through  hand  drawn  diagrams  depicting  everything  from  project  decom¬ 
position  to  dataand  control  flow.  CASE  (computer-aided  software  engineering)  emerged 
in  the  eighties  when  it  became  economically  feasible  to  computerize  structured  program¬ 
ming  by  providing  graphical  user  interfaces  to  manipulate  these  diagrams. 

KBSE  (knowledge-based  software  engineering)  is  a  much  more  ambitious  endeavor 
than  current  approaches  to  CASE.  The  key  observation  is  that  the  current  practice  of 
modem  software  engineering  lacks  the  sound  mathematical  basis  characterizing  other 
engineering  disciplines.  This  limits  the  complexity  of  software  systems  that  can  be 
constructed  with  a  high  degree  of  reliability.  Formal  methods,  the  application  of 
mathematical  logic  to  software  engineering,  is  just  beginning  to  have  an  impact  on  real 
software  engineering  practice.  The  goal  of  KBSE  is  nothing  less  than  the  computerization 
of  formal  methods  for  all  phases  of  the  software  life  cycle  [8). 

KBSE  addresses  the  essential  tension  between  problem  specification  and  efficient 
solution  implementation.  This  tension  makes  it  difficult  to  modify  and  reuse  programs. 


since  efficieni  code  incorporates  constraints  from  all  parts  of  a  problem  specification  in 
the  optimization  of  individual  program  fragments.  Hence,  local  incremental  changes  to 
a  problem  specification  often  require  extensive  non-local  changes  to  optimized  ccxle. 
Modification  of  production  quality  code  is  so  time  consuming  that  maintenance  costs 
currently  dominate  software  life  cycle  resources.  Furthermore,  reuse  of  prcxluction 
quality  code  has  been  difficult  to  achieve.  Advances  in  programming  language  and 
compiler  technology  have  raised  the  level  of  programming  absaactions.  but  have  not 
addressed  the  essential  difference  between  optimized  code  suitable  for  efficient  compu¬ 
tation  and  formal  problem  specifications  suitable  for  reuse  and  modification.  The  current 
paradigms  for  programming  languages  cannot  in  principle  bridge  this  gap.  because  to 
guarantee  compiler  performance  the  peephole  on  the  srntrrt  cixle  used  in  optimizing 
machine  language  code  must  be  limited. 

KBSE  bridges  this  gap  by  introducing  a  new  software  development  paradigm: 
problems  are  first  formalize  at  the  level  of  the  declarative  semantics  of  an  application 
domain,  and  then  semi-automatically  transformed  to  the  operational  semantics  of  an 
efficiently  compileable  programming  language  [8].  Formal  methods  provide  the  math¬ 
ematical  basis  for  this  transformation.  Automated  reasoning  provides  the  means  for 
carrying  out  the  transformation.  By  raising  the  level  at  which  problems  are  formalized, 
modification  and  reuse  will  be  greatly  facilitated.  Furthermore,  by  introducing  formal 
artifacts  earlier  in  the  software  life  cycle,  mechanized  support  can  be  provided  for  the  full 
spectrum  of  software  engineering  activities,  from  requirements  engineering  to  validation 
and  maintenance. 

Evolutionary  improvements  of  current  software  engineering  methodologies  can  be 
achieved  with  existing  KBSE  technology.  Particularly  promising  arc  domain-specific 
program  synthesis  tools  and  re-engineering  tools  to  modify  and  maintain  existing  ccxle. 
However,  achieving  the  full  KBSE  paradigm  will  require  many  technical  advances. 
Foremost  are  search  control  for  automated  reasoning,  and  interactive  assistance  in 
requirements  formalization  and  validation 

Raising  the  level  at  which  problems  are  formalized  from  the  programming  level  to 
the  specification  level  will  eliminate  many  conceptual  and  design  errors.  These  errors  can 
cost  over  a  hundred  times  more  to  fix  during  the  testing  phase  than  simple  coding  errors 
[  1  ].  This  is  one  major  motivation  for  applying  formal  methcxls  even  without  computer- 
aided  assistance.  However,  even  at  the  specification  level,  formalization  is  a  difficult 
process,  and  many  of  the  most  costly  software  errors  can  be  traced  back  to  the 
transformation  from  informal  requirements  to  specifications.  New  methods  for  specifi¬ 
cation  validation  are  needed.  AI  programming  environments  have  already  contributed  to 
one  method,  rapid  prototyping,  in  which  executable  specificationsare  developed  in  a  very 
high  level  programming  language  and  then  validated  interactively  with  end  users.  Other 
Al  approaches  to  specification  validation  are  described  in  [17,  18,  24)  and  their 
references. 

This  paper  first  describes  several  methodologies  for  program  synthesis  with  an 
emphasis  on  search  control,  drawing  upon  the  literature  and  the  author's  own  work.  It  then 
contrasts  knowledge-based  methodologies  for  software  engineering  with  methodologies 
based  on  CASE,  and  describes  the  evolution  from  the  software  engineering  life  cycle  to 
the  knowledge  engineering  life  cycle. 


221 


2  Methodologies  for  Program  Synthesis 


2.1  Constructive  Theorem  Proving 

Soon  after  Robinson  [27]  developed  resolution  as  the  first  practical  means  for  automated 
theorem  proving  in  predicate  logic,  it  was  applied  to  automatic  program  synthesis.  Green 
[6]  and  Waldinger  [33]  demonstrated  the  generation  of  small  programs  such  as  sorting 
algorithms  through  constructive  proofs  from  specifications  of  the  form; 

V.r3y  Precondition (.r)  ->  Postconditiont.v.y) 

In  this  specification  schema  x  is  a  vector  of  input  variables,  y  is  a  vector  of  output 
variables,  and  Precondition(.v)  is  a  formula  constraining  the  input  variables.  A  construc¬ 
tive  proof  binds  y  to  a  term  which  makes  the  specification  a  theorem.  If  this  term  is 
composed  of  functions  in  the  programming  language  and  the  only  variables  in  the  term 
are  input  variables,  then  the  term  represents  a  functional  program.  The  inference  process 
is  similar  to  logic  programming;  first  the  universally  quantified  variables  in.v  are  replaced 
by  unique  constants,  the  formula  Precondition(.v)  is  asserted  over  these  constants,  and 
then  Postcondition(r.y)  is  negated  and  resolution  is  repeatedly  applied  until  a  refutation 
is  derived.  The  program  term  is  built  up  through  unification  with  the  output  variables  y. 
Recursive  and  iterative  constructs  are  derived  through  inductive  proofs  using  inductive 
schemata  or  through  additional  inference  rules.  Manna  and  Waldinger  [  19)  later  devel¬ 
oped  an  elegant  nonclausal  variation  suitable  for  manually  conu-ollcd  derivations. 

Initially  it  was  hoped  that  advances  in  generic  theorem  proving  strategies  would 
sufficiently  control  search  to  enable  automated  derivations  to  scale  up  to  large  problems. 
While  impressive  progress  was  made  during  the  early  seventies  (31,  generic  resolution 
strategies  were  never  able  to  mitigate  combinatorial  explosion  cifectively.  Program 
synthesis  often  requires  deep  reasoning;  generating  recursive  and  iterative  programming 
constructs  through  inductive  proofs  considerably  expands  the  combinatorial  explosion. 
In  retrospect,  it  is  unlikely  that  general  purpose  theorem-proving  strategies  will  ever  be 
sufficient  to  control  the  combinatorial  search  inherent  in  automated  program  synthesis. 

2.2  Program  Transformations 

An  alternative  approach  to  program  synthesis  is  incremental  transformation  of  specifica¬ 
tions  to  implementations  through  program  transformations,  :.e.  oriented  rewrite  rules.  In 
its  purest  form,  the  transformational  approach  is  formalized  by  the  semantics  of 
conditional  equational  logic.  In  its  more  restricted  variants  the  transformational  approach 
can  greatly  reduce  search  [23].  It  is  also  more  adaptable  than  theorem  proving  to  less 
formal  knowledge  engineering  approaches,  and  can  be  viewed  as  an  extension  of  current 
compiler  technology.  Transformations  are  typically  oriented  from  higher  level  specifica¬ 
tion  constructs  to  lower  level  implementation  constructs,  thus  providing  an  overall 
direction  to  the  search.  Sets  of  rewrite  rules  are  characterized  by  properties  such  as 
termination  and  confluence  (guaranteed  termination  in  a  unique  normal  form).  The 
Knuth-Bendix  completion  procedure  [12],  given  an  appropriate  weighting  scheme  for 
terms  and  a  set  of  rewrite  rules,  will  add  rewrite  rules  until  the  set  becomes  confluent.  The 
Knuth-Bendix  completion  procedure  is  not  guaranteed  to  terminate,  and  generally  works 
only  on  small  sets  of  rules  and  for  restricted  kinds  of  weighting  schemes. 


222 


Despite  the  well-behaved  search  properties  for  restricted  variants  ol  program 
transformations,  obtaining  the  same  problem  solving  power  as  constructive  theorem 
proving  ultimately  requires  addressing  the  same  combinatorial  search  issues.  One  way  ol 
increasing  problem  .solving  capability  is  to  make  the  conditions  for  applying  an  inlerencc 
rule  more  complex;  but  as  the  complexity  is  increa.sed  the  amount  i  f  inference  required 
typically  increases  exponentially.  Furthermore,  introducing  general  capabilities  for 
producing  recursive  and  iterative  constructs  requires  expanded  capabilities  such  as 
folding  [4],  in  which  a  rewrite  rule  is  reversed  to  introduce  the  application  of  a  function 
from  an  instance  of  its  definition.  This  reversal  of  rewrite  rule  orientation  leads  to  a 
combinatorial  explosion  of  possibilities.  In  general,  as  the  scope  of  a  transformation 
system  is  expanded  to  encompass  a  larger  set  of  possible  programs  as  output,  the  search 
space  expands  drastically. 

2.3  Manually  Guided  Program  Synthesis/Verification 

At  the  opposite  extreme  to  totally  automated  search  control,  several  early  systems 
(e.g.[2])  had  the  userselect  each  primitive  step  of  the  inference  or  transformation  process. 
Initially  it  was  hoped  that  this  approach  would  be  a  viable  means  for  mechanically 
assisted  program  synthesis  or  verification.  However,  the  sheer  number  of  steps  required 
made  this  approach  infeasible  outside  of  research  settings  or  in  applications  such  as 
avionics  requiring  extreme  reliability.  It  was  far  easier  to  develop  a  program  by  hand  than 
guide  an  inference  system  through  the  large  number  of  primitive  steps. 

The  problems  with  totally  automated  search  control  using  generic  su-ategies  and  the 
other  extreme  of  manual  guidance  of  primitive  inference/transformation  rules  has  led  to 
methodologies  centered  on  human/computer  partnerships  and  reuse.  These  include  the 
intelligent  assistant  approach  in  which  humans  make  stfategic  decisions  while  the 
computer  carries  out  bounded  searches,  reuse  of  generic  programs  or  derivations,  and 
encoding  of  tactical  and  strategic  program  design  knowledge.  Each  of  these  methodolo¬ 
gies  introduces  interacting  knowledge  representation  and  automated  reasoning  issues. 

2.4  Intelligent  Programming  Assistant 

Floyd  [5]  presented  an  early  vision  of  an  intelligent  programming  assistant,  in  which  the 
computer  kept  track  of  clerical  details  while  the  human  made  the  important  strategic 
decisions.  A  key  issue  is  developing  representations  for  program  derivations  that  are 
human  comprehensible  and  machine  manipulablc.  The  decision  making  also  has  to  be 
factored  to  limit  the  search  carried  out  by  automated  reasoning  while  presenting 
meaningful  strategic  decisions  for  the  human  user.  These  constraints  rule  out  certain 
technologies  such  as  clausal  resolution. 

The  programmer’s  assistant  project  at  MIT,  spanning  the  years  1973  to  1992,  was  an 
influential  effort  particularly  in  the  area  of  re-engineering.  The  main  achievements  in  the 
early  years  were  the  development  of  the  plan  formalism,  a  language  independent 
representation  for  programsand  programming  knowledge,  anddemonstration  of  KBEmacs, 
an  editor  for  manipulating  programs  in  this  formalism. 

The  plan  formalism  represented  programs  as  flowcharts  with  explicit  data  and 
control  flow  arcs.  The  main  innovation  of  the  plan  formalism  was  support  for  program¬ 
ming  cliches,  which  are  reusable  algorithmic  fragments  <such  as  enumeration  over  a  file) 
that  were  engineered  to  correspond  to  expert  human  programming  knowledge.  Analyzers 
were  developed  for  several  programming  languages  that  recognized  instances  of  cliches 


223 


in  program  text.  The  combination  of  analyzers  and  translators  between  the  plan  formal¬ 
ism  and  program  text  enabled  significant  re-engineering  capabilities,  such  as  modifica¬ 
tion  of  programs  at  the  level  of  programming  cliches  and  improved  translation  of 
programs  liet  ween  programming  languages  via  abstraction  to  the  plan  formali.sm  and  then 
reimplementation  in  the  target  programming  language  (.14). 

However,  the  plan  formalism  lacked  the  semantic  basis  to  provide  generality  and 
power;  reasoning  was  carried  out  by  ad  hoc  pnxedures.  This  limited  the  feasibility  of 
extending  the  capabilities  of  KBEemacs.  To  address  these  limitations  Rich  [25]  formal¬ 
ized  the  plan  formalism  into  the  plan  calculus  and  then  developed  CAKE  (26j.  a  layered 
automated  reasoning  system.  CAKE  is  a  careful  integration  of  different  automated 
reasoning  capabilities  (e.g.  truth  maintenance,  propositional  reasoning,  equality,  and 
types)  that  appears  as  an  active  knowledgebase  for  software  artifacts.  CAKE's  automati¬ 
cally  invoked  inference  procedures  are  constrained  to  run  in  polynomial  time;  user 
queries  can  invoke  more  time  consuming  reasoning  procedures.  Significant  new  capabili¬ 
ties  for  the  programmer’s  apprentice  were  implemented  on  top  of  the  plan  calculus  and 
CAKE,  including  a  debugging  assistant  [  14]  and  a  requirements  assistant  (RA)  [28].  The 
RA  is  a  good  example  of  the  interactive  problem  formalization  assistance  that  can  be 
provided  through  KBSE:  the  RA  notified  a  user  when  it  detected  ambiguity,  contradic¬ 
tion,  incompleteness,  or  inaccuracy  in  an  evolving  requirements  specification. 

Several  lessons  can  be  learned  from  the  evolution  of  MIT's  programmer  assistant 
project.  First,  although  a  knowledge  engineering  approach  is  useful  in  the  initial 
development  of  a  representation  meaningful  to  humans,  achieving  generality  and  power 
requires  a  semantically  well  defined  formal  representation  with  semantically  well 
founded  inference  procedures.  Without  these,  the  implicit  assumptions  which  facilitate 
an  ad  hoc  approach  become  limitations  hindering  the  expansion  of  a  KBSE  system.  These 
factors  will  limit  the  expansion  of  current  CASE  systems,  because  of  their  shallow 
representations  and  ad  hoc  reasoning  procedures.  Second,  developing  the  automated 
reasoning  capabilities  to  support  a  formal  representation  requires  significant  engineering 
to  avoid  combinatorial  explosion. 

2.5  Replaying  Program  Derivations 

An  alternative  to  reusing  high-level  generic  program  fragments  such  as  cliches  is  the 
reuse  of  program  derivations.  This  approach  spans  the  range  from  role  replay  of 
derivations  to  derivational  analogy  [20].  Program  derivation  reuse  is  particularly 
appealing  because  it  has  the  potential  to  support  the  incremental  modification  of 
specifications  by  rederiving  efficient  implementations  through  replay  of  the  original 
derivation.  When  the  replay  system  encounters  part  of  the  derivation  which  is  no  longer 
applicable,  then  it  transfers  control  to  the  u.ser. 

Derivational  analogy  replay  systems  have  been  successfully  applied  in  domains  such 
as  VLSI  design  where  the  mapping  from  the  input  of  the  derivation  system  to  the  output 
is  localized;  that  is,  each  part  of  the  output  is  attributable  to  localized  parts  of  the  input. 
However,  as  discussed  earlier,  optimized  code  must  potentially  incorporate  constraints 
from  all  parts  of  a  specification.  For  this  reason,  substantial  parts  of  the  original  program 
derivation  might  no  longer  be  applicable  after  an  increment^  change  in  .specification.  To 
date,  most  derivational  analogy  replay  systems  for  program  synthesis  have  operated  on 
representations  of  the  enablement  structure  of  transformation  or  inference  rules.  There  is 
typically  no  representation  in  the  derivation  record  of  the  purpose  for  applying  a 
transformation  in  meeting  a  performance  goal  for  the  optimized  code  (however,  see  [^  5)). 


224 


Thus  there  is  insufficient  information  for  making  good  analogies  to  other  derivations 
when  parts  of  the  original  derivation  are  no  longer  applicable.  For  these  reasons, 
derivational  analogy  replay  systems  have  had  little  more  success  so  far  than  role  replay 
systems  in  program  synthesis. 

2.6  Design  Analysis 

While  generic  theorem  proving  or  transformational  strategies  have  had  limited  success 
in  automating  program  synthesis,  a  more  promising  methodology  is  to  develop  efficient 
tactics  for  controlling  automated  reasoning  for  particular  classes  of  software  artifacts. 
Each  tactic  can  be  viewed  in  itself  as  a  special  purpose  program  synthesizer.  However, 
because  tactics  are  control  programs  for  general  purpose  inference  mechanisms,  they  can 
be  easily  combined.  The  following  describes  one  methodology  for  developing  tactics 
within  the  context  of  parameterized  theories  and  algebraic  specifications.  The  example 
used  is  the  development  of  a  tactic  for  synthesizing  local  search  algorithms,  more  details 
can  be  found  in  [15]. 

Design  analysis  is  a  methodology  for  formalizing  both  the  structural  properties 
common  to  a  class  of  software  artifacts  and  the  genetic  properties  common  to  their 
derivations  [30].  This  formalization  is  then  used  to  develop  a  design  tactic  that  automati¬ 
cally  designs  an  artifact  in  this  class  given  a  specification  of  its  behavior.  Desi gn  analysis 
formalizes  intrinsic  structural  properties  rather  than  properties  specific  to  a  particular 
programming  language  or  application  domain.  By  abstracting  away  these  particular 
concerns,  the  resulting  formalization  is  more  broadly  applicable.  The  objective  is  to  find 
a  general  mathematical  characterization  of  the  structure  of  a  class  while  at  the  same  time 
capturing  the  features  that  provide  search  guidance  for  designing  artifacts  in  a  class. 

The  first  step  of  design  analysis  for  algorithm  synthesis  is  to  study  many  examples 
of  a  naturally  defined  class  of  algorithms.  For  example,  local  search  algorithms,  also 
referred  to  as  hill-climbing  algorithms,  are  a  natural  class  in  which  a  feasible  solution  to 
an  optimization  problem  is  iteratively  improved  by  searching  a  neighborhood  of  the 
solution  for  a  better  solution,  and  stopping  when  no  neighboring  .solution  is  better.  The 
second  step  of  design  analysis  is  to  extract  the  features  and  structural  constraints 
characterizing  that  class  of  algorithms.  The  neighborhood  structure  determines  the 
properties  of  a  local  search  algorithm ;  exact  neighborhood  structures  guarantee  that  local 
optimums  are  global  optimums,  while  the  weaker  condition  of  reachability  guarantees 
that  all  feasible  solutions  for  a  given  input  are  mutually  reachable.  Reachability  is  a 
necessary  condition  for  variants  of  local  search  that  can  bactrack  out  of  local  optimums, 
such  as  simulated  annealing,  to  converge  on  global  optimums.  The  third  step  is  to 
formalize  this  characterization  in  a  theory.  The  theory  of  neighborhood  structures  for 
local  search  algorithms  is  an  extension  of  the  theory  for  optimization  problems. 

A  basic  problem  is  specified  by  defining  a  set  of  inputs  D,  a  set  of  outputs  R.  an 
operation  I  that  maps  legal  inputs  to  true,  and  an  operation  O  that  maps  input/output  pairs 
to  true  when  the  output  is  a  feasible  solution  to  the  input.  A  basic  problem  specification 
is  a  tuple  B=  <D,R,I,0>. 

An  optimization  problem  is  specified  by  extending  a  basic  problem  specification 
with  an  ordering  relation  in  which  ^1  pairs  of  feasible  solutions  are  comparable.  All  such 
ordering  relations  can  be  formulated  as  a  cost  function  that  maps  feasible  solutions  to  a 
totally  ordered  set.  For  most  problems  the  cost  function  maps  feasible  solutions  to  the 
integers,  rationals,  or  reals.  The  totally  ordered  set  is  denoted  where  9?  is  the  set 


225 


and  <  is  the  total  order  relation.  Thus  an  optimization  prr^lem  is  specified  through  a  tuple 
Opt=<D,R4,0,iR,<.cost). 

A  local  search  theory  LS  =  (Opt,N}  is  specified  by  an  optimization  problem  ajid  a 
neighborhood  relation.  Three  axioms,  two  being  optional,  constrain  the  neighborhood 
relation,  which  is  a  ternary  relation  between  an  input  and  two  elements  of  the  output 
domain.  First,  each  feasible  solution  is  in  its  own  neighborhood,  so  that  for  any  legal  input 
the  neighborhood  relation  is  a  reflexive  relation  on  feasible  outputs  (Axiom  LS  1 ).  If  the 
neighborhood  structure  is  exact,  then  the  local  search  theory  will  be  called  exact  (Axiom 
LS2).  Likewise,  if  the  neighborhood  structure  is  reachable,  the  local  search  theory  will 
be  called  reachable  (Axiom  LS3).  A  local  search  theory  for  a  particular  optimization 
problem  is  defined  by  a  mapping  from  the  components  of  abstract  local  search  theory  to 
definitions  of  objects,  functions  and  relations  in  the  problem  domain.  More  formally,  the 
mapping  is  a  theory  interpretation,  which  means  that  the  abstract  axioms  are  true  when 
they  are  mapped  to  the  problem  domain  theory.  Abstract  local  search  theory  is  defined 
as  follows: 


Sorts  D,R,9{ 

Operations 
I ;  D  -4  boolean 
0:DxR  -4  boolean 
costrDxR 

xSR  -4  boolean 
N;Dx  Rx  R  boolean 

Optimal(A:,y)  s  V(y')0(.v,>')  =>  cost(.r,y)  <  cosUx,y'} 

Axioms 

LSI:  Reflexive  Neighborhood 

V(.r,y)  I(.r)AO(.r,y)=i>N(.v,_y,>’) 

LS2:  Exact  Neighborhood 

V(A:,y)  I(x)  AO(.t,y)  A[V(y')  0(.r,y')  AN(.r,>’,/)  =>cost(.v.y)  <  costf.r.y')] 
=>  Optimal(.r,y) 

LS3:  Reachable  Neighborhood 

'^{x,y,y')  I(.r)  a  0(x,y) /\0(x./)  =>  N‘(.r,y,/) 
where  N*  is  the  reflexive  and  transitive  closure  of  N: 

V{x,y)N‘’(.r,y,y)  =  I(x)  a  0(.r,y) 

V(^  6  /Vat:x,y.y')N**'(.r,y.y')  =3(2)0(a:,2)  aN‘(.v,>’,;)  A  N(.v,r,y') 
V(A:,y,y')N*(x,y.y')  =  3{k  e  /Va/)N*(x,y,y') 


To  derive  a  local  search  algorithm  for  a  particular  optimization  problem,  a  partial 
mapping  from  this  abstract  local  search  theory  to  the  components  of  an  optimization 
problem  is  first  created.  Constraints  for  a  suitable  neighborhe^  relation  are  then  derived 
by  instantiating  the  abstract  neighborhood  axioms  with  these  components.  The  main  part 
of  the  design  tactic  is  to  derive  the  definition  of  a  neighborho^  relation  from  these 
constraints  in  terms  of  the  problem  domain.  Once  the  neighborhood  relation  is  defined. 


226 


an  initial  algorithm  can  be  derived  by  instantiating  a  program  schema  with  the  compo¬ 
nents  of  the  derived  local  search  theory.  TTiis  high-level  algorithm  can  be  further  refined 
with  optimization  tactics  such  as  partial  deduction  *13]  and  finite  differencing  [21]. 

Formalizing  the  structure  of  a  class  of  software  artifacts  is  by  itself  usually 
insufficient  for  providing  mechanized  design  assistance:  it  is  also  necessary  to  formalize 
the  structure  of  the  derivations.  In  the  example  of  local  search  algorithms,  the  axioms  for 
reachability  and  exactness  defined  above  are  too  general  to  avoid  combinatorial  explo¬ 
sion  in  automated  reasoning.  The  axioms  for  reachability  require  induction  over  the 
transitive  closure  of  neighborhoods,  which  can  be  difficult  for  automated  theorem 
provers.  The  exact  neighborhood  axiom,  as  stated,  does  not  provide  sufficient  structure 
for  determing  its  satisfaction  for  most  problems.  (Typically,  proofs  for  exact  neighbor¬ 
hoods  are  done  through  reduction  to  problems  with  known  exact  neighborhoods  such  as 
linear  programming,  or  through  lemmas  about  ctmvex  functions.) 

To  provide  heuristic  adequacy  for  guiding  derivations,  various  specializations  of  the 
general  structure  are  derived.  For  local  search  algorithms  whose  neighborhoods  are 
reachable  but  not  necessarily  exact,  most  neighborhoods  for  efficient  local  search 
algorithms  can  be  described  as  natural  perturbations  of  data  sUuctures:  [The  key  step  in 
deriving  a  local  search  algorithm  is  the]  selection  of  a  neighborhood  or  a  class  of 
neighborhoods,  and  this  is  tied  to  the  notion  of  a  ‘natural’  perturbation  of  a  feasible 
solution”  ([22]  pg.  469).  The  theory  of  groups  and  group  actions  provides  the  mathemati¬ 
cal  basis  for  formalizing  natural  perturbations. 

A  natural  perturbation  neighborhood  is  defined  for  a  data  structure  by  a  set  of 
permutations  and  a  group  action  mapping  these  permutations  to  perturbations  of  each 
instance  of  the  data  structure.  Thus  the  neighborhoods  for  all  instances  of  the  data 
structure  are  similar  extentionally  and  have  the  same  intentional  description  based  on  the 
set  of  permutations.  A  permutation  is  any  one-to-one  (and  hence  invertible)  function  from 
some  set  of  objects  to  the  same  set.  The  closure  of  this  set  of  permutations  under 
composition  together  with  the  group  action  defines  three  interrelated  structures;  a  group 
of  permutations,  the  mutually  reachable  data  structures,  and  the  invariant  properties  of 
mutually  reachable  data  structures. 

The  specialization  of  reachable  neighborhoods  to  natural  perturbations  entails  only 
two  restrictions  on  the  reachable  neighborhoods  axiomatized  in  the  abstract  theory  of 
local  search.  First,  neighborhoods  are  required  to  be  symmetric,  that  is  if  y  is  in  .v's 
neighborhood  then  x  is  in  y’s  neighborhood.  Most  local  search  algorithms  satisfy  this 
condition.  This  condition  ensures  that  if  z  is  reachable  from  w  then  h'  is  reachable  from 
z.  Second,  the  neighborhoods  of  all  feasible  solutions  are  similar;  they  have  the  same 
intensional  description  in  terms  of  the  set  of  permutations.  These  two  restrictions  are 
sufficient  to  enable  the  tools  of  group  theory  to  be  used  in  developing  reachable 
neighborhood  structures  for  a  wide  variety  of  optimization  problems.  The  mathematics 
and  proofs  are  fully  developed  in  [15]. 

Specializing  reachable  neighborhoods  to  natural  perturbation  neighborhoods  con¬ 
siderably  simplifies  automated  reasoning.  In  particular,  it  is  no  longer  necessary  to  do  an 
inductive  proof  on  the  transitive  closure  of  neighborhoods:  reachability  is  ensured  if  and 
only  if  the  invariant  properties  of  a  natural  perturbation  neighborhood  are  equivalent  to 
the  feasibility  constraints  for  problem  solutions.  If  the  invariants  are  stronger  than  the 
feasibility  constraints  then  some  feasible  solutions  would  not  be  reachable  from  other 
feasible  solutions.  If  the  invariants  are  weaker  than  the  feasibility  constraints  then  some 
feasible  solutions  would  be  mapped  to  infeasible  solutions. 


227 


The  local  search  design  tactic  developed  in  this  approach  matches  a  problem 
specification  to  a  library  theory  whose  invariants  are  equivalent  or  weaker  than  the 
feasibility  constraints,  and  then  specializes  the  library  theory  if  the  invariants  are  weaker. 
The  library  theories  are  defined  for  general  set  theoretic  data  structures,  such  as  ordered 
sequences,  as  explained  below.  Reasoning  about  invariant  properties  and  feasibility 
constraints  provides  a  computationally  tractable  method  of  matching  and  then  specializ¬ 
ing  theories  in  a  library  to  problem  specifications.  The  theorem  prover  does  not  have  to 
reason  directly  about  the  second  order  reachability  axioms  -  this  has  already  been  done 
by  the  creator  of  the  library  theories. 

Library  theories  are  based  on  basic  neighboihoods  which  are  the  subclass  of  natural 
perturbation  neighborhoods  in  which  the  permutations  are  restricted  to  be  all  the 
transpositions  of  some  underlying  set,  that  is,  permutations  in  which  only  two  elements 
are  interchanged.  For  example,  one  basic  neighborhood  structure  for  an  ordered  sequence 
is  defined  by  all  the  transpositions  of  the  indices  of  the  sequence.  Basic  neighborhoods 
are  typically  overly  general  for  any  particular  problem;  the  design  tactic  first  matches  a 
problem  specification  to  a  basic  neighborhood  and  then  specializes  the  basic  neighbor¬ 
hood.  The  current  library  of  parameterized  natural  perturbation  neighborhood  theories 
consists  of  a  half  dozen  basic  neighborhood  definitions,  which  include  specifications  of 
their  invariants.  A  basic  neighborhood  has  the  following  defmition  schema  as  a  ternary 
relation,  where  y,y'  are  neighboring  feasible  solutions  with  respect  to  input  x.  and  i,j  are 
the  elements  that  are  transposed: 

Xx.y,y'3{i,j  e  S)  y'  =  Action(jc,y,i,;) 

A  local  search  library  theory  for  a  basic  neighborhood  consists  of  the  basic 
neighborhood  definition  and  definitions  for  the  other  components  of  a  local  search  theory. 
It  is  presented  as  a  mapping  of  the  following  form  from  abstract  local  search  theory  to  a 
set  of  definitions: 

LS  -  basic  theory 
D  datatype\{a) 

R  i->  datatype2(a) 

It-^Aj:.P(x) 

Oi->Xj:,y.  Invariant(x.y) 

N I-)  Xjc,  y,  y'.  3(i,y  e  F(x))  y'  =  Action(j:,y,/,y) 

For  example,  the  following  mapping  from  abstract  local  search  theory  defines  the 
basic  neighborhood  structure  for  same-sized  subsets  of  a  given  finite  set  S.  The  size  of  the 
subsets  are  a  constant  size  m,  the  elements  which  are  transposed  are  the  elements  of  the 
finite  set: 

LS  -  subset  theory 
D  set(a)  X  integer 
R  (->  set(a) 

I  h-»  XS,  m.  m  S  size(S) 

O  i->  AS,  m,  y.  y  Q  S  A  size(y)  =  m 

Ni->XS,m,y,y'.  3(/.y)ie(S-y)A  je  yA  y'=(yu{i})-{y) 

This  theory  can  be  matched  to  a  wide  range  of  problems  including  the  class  of  ACS 
(additive  cost  subset)  problems  defined  by  Savage.  A  typical  example  is  the  minimal 


228 


spanning  tree  problem  (MST).  which  is  to  find  a  minimally  weighted  subset  of  edges  in 
a  graph  that  span  the  nodes  of  the  graph  without  any  cycles. 

Given  the  specification  of  an  optimization  problem,  the  local  search  design  tactic 
first  matches  the  problem  specification  to  a  library  theory  for  a  basic  neighborhood,  and 
then  specializes  the  library  theory  by  finding  necessary  conditions  on  transpositions  to 
ensure  that  feasible  solutions  are  transformed  to  better  feasible  solutions.  The  design 
tactic  takes  the  following  steps;  each  step  is  a  well  defined  inference  fM’oblem  with  a 
manageable  search  space;  the  overall  effect  is  to  replace  a  large  search  space  with  a 
sequence  of  smaller  search  spaces: 

1.  Retrieve  and  match  basic  neighborhood  theories  from  the  library  indexed  by  the  type 
of  feasible  solution.  A  theory  matches  a  problem  specification  if  the  invariants  of  the 
theory  are  necessary  conditions  of  the  feasibility  constraints  of  the  specification. 

2.  Determine  necessary  preconditions  on  the  transpositions  that  ensure  that  a  feasible 
solution  is  perturbed  to  a  feasible  solution. 

3.  Determine  necessary  preconditions  on  the  transpositions  that  ensure  that  a  feasible 
solution  is  perturbed  to  a  l.'etter  feasible  solution. 

4.  (Optional  step  for  deriving  exact  local  search  algorithms)  Determine  necessary 
conditions  for  a  local  optimum  to  be  a  global  optimum. 

5.  Instantiate  a  program  schema  with  the  components  of  the  theory  where  the  derived 
preconditions  on  transpositions  are  guards  on  the  application  of  a  transposition. 

Proofs  for  the  correctness  of  the  tactic  and  a  detailed  description  illustrated  with  the 
derivation  of  the  simplex  algorithm  can  be  found  in  [  1 5].  This  design  tactic  was  developed 
as  an  extension  of  the  KIDS  system  [31].  A  generalization  of  the  methods  for  matching 
library  theories  to  problem  specifications  is  presented  in  [32]. 

2.7  Summary  of  Methodologies  for  Program  Synthesis 

The  previous  subsections  have  reviewed  various  methodologies  for  automated  or  semi- 
automated  program  synthesis.  In  order  to  generate  production  quality  code  from  high 
level  problem  specifications,  a  program  synthesis  system  should  be  able  to  incorporate 
constraints  from  all  parts  of  a  specification  into  each  program  fragment.  This  cannot  be 
achieved  by  translation  systems  which  resuict  the  window  on  the  source  code  considered 
in  optimizing  the  output  code. 

The  general  dilemma  is  that  the  number  of  possible  programs  that  can  be  generated 
from  a  given  specification  is  combinatorially  explosive.  General  purpose  methods  have 
not  been  found  for  efficiently  searching  this  space  for  efficient  programs:  it  is  unlikely 
any  such  universal  method  exists.  However,  as  described  above,  currently  there  are  some 
effective  techniques  for  capturing  the  knowledge  used  by  expert  human  programmers  and 
applying  it  in  automated  fashion.  Even  setting  aside  the  issue  of  program  correctness,  it 
is  worthwhile  capturing  this  knowledge  within  a  formal  representation  with  semantically 
well-founded  inference  methods.  This  avoids  the  limitations  that  arise  when  trying  to 
generalize  and  expand  ad  hoc  methods.  To  avoid  intractable  searches  during  automated 
program  synthesis,  it  is  necessary  fw  knowledge  representations  and  inference  methods 
to  be  carefully  designed  in  tandem  to  limit  search  spaces.  It  is  likely  that  as  continued 
progress  is  made  in  automated  program  synthesis,  improvements  in  formal  knowledge 
representations  will  be  driven  as  much  by  considerations  of  making  inference  tractable 
as  by  theoretical  considerations  in  mathematical  logic. 


229 


3  From  the  Software  Life  Cycle  to  the  Knowledge  Life  Cycle 

The  previous  section  described  various  methodologies  for  program  synthesis.  While 
automated  program  synthesis  is  necessary  to  raise  software  engineering  from  the 
programming  level  to  the  specification  level,  it  is  only  one  component  of  the  KBSE 
paradigm.  As  KBSE  matures  and  increasingly  automates  the  software  engineering  life 
cycle,  software  engineering  resources  will  increasingly  shift  toward  knowledge  acquisi¬ 
tion  and  the  automated  reuse  of  knowledge  for  developing  software  artifacts.  This  section 
describes  how  the  various  components  of  KBSE  could  interact. 

Knowledge,  like  software,  has  its  own  characteristic  life  cycle.  The  knowledge  life 
cycle  is  the  maturation  of  design  knowledge  fcM*  an  aiqtlication  domain  from  the  initial 
research  stage  to  the  cookbook  engineering  stage.  Knowledge-based  design  tools  can 
provide  support  at  stages  of  the  knowledge  life  cycle  that  are  not  well  supported  with 
conventional  software  design  tools.  Furthermore,  knowledge-based  design  tools  have  the 
potential  of  significantly  compressing  the  knowledge  life  cycle. 

One  principle  objective  of  KBSE  is  to  compress  the  software  life  cycle  with 
knowledge-based  tools.  By  its  very  nature,  knowledge-based  design  depends  on  the 
maturity  of  knowledge  about  an  application  domain.  As  knowledge  about  an  application 
domain  is  developed  by  scientists  and  engineers,  the  design  process  for  that  domain 
makes  a  transition  from  creative  and  innovative  design  to  routine  and  cookbook  design. 
Different  kinds  of  design  tools  are  appropriate  at  different  stages  of  the  knowledge  life 
cycle.  One  of  the  main  anticipated  advantages  of  KBSE  is  the  ability  to  leverage  the 
expertise  of  scientists  and  engineers  by  capturing  their  design  knowledge  at  all  stages  of 
the  knowledge  life  cycle. 

An  example  of  the  knowledge  life  cycle  is  the  development  of  the  theory  and 
technology  for  designing  parsers  used  in  compilers  and  other  language-processing 
systems,  ^ly  parsers  in  the  late  fifties  were  ad  hoc  systems.  Not  only  was  there  a  lack 
of  a  theory  of  parsing  to  guide  their  design,  there  was  not  even  a  theory  of  grammars  that 
specified  the  function  of  a  parser.  Thus  the  design  of  early  parsers  was  creative:  both  the 
structure  and  function  were  unknown  and  ill-defined.  The  development  of  the  BNF 
formalism  in  the  early  sixties  clarified  the  function  of  a  parser:  to  produce  a  trace  of  the 
BNF  rules  used  to  generate  a  text  string  from  the  text  string  itself.  At  this  stage  the  design 
of  parsers  became  innovative:  the  function  was  known,  but  the  structure  of  possible 
solutions  was  still  unexplored. 

The  mid  sixties  to  the  mid  seventies  witnessed  rapid  development  of  the  theory  and 
technology  for  parsing.  First,  recursive  descent  parsing  was  formalized,  enabling  parser 
development  to  become  a  routine  design  task.  While  routine,  the  design  of  these  early 
recursive  descent  parsers  required  the  configuration  of  a  set  of  procedures,  i.e.,  configu¬ 
rational  design.  By  the  late  sixties,  several  table-driven  parsers  were  developed:  operator 
precedence,  LL  parsing,  and  LR  parsing.  (These  early  parsing  formalisms  did  not  handle 
left  recursion,  which  required  the  development  of  LALR  parsing.)  Designing  a  table- 
driven  parser  is  now  a  routine  parametric  design  process,  that  is,  the  output  of  a  design 
is  a  set  of  parameters  for  a  specialized  representation.  When  knowledge  about  an 
application  domain  becomes  this  advanced,  then  automated  design  tools  can  be  readily 
developed  with  conventional  software  technology.  Hence  in  the  late  sixties  and  seventies, 
parser  generators  were  developed  that  take  a  specification  of  a  grammar  and  automati¬ 
cally  generate  the  parameters  for  a  table-driven  parser. 


230 


High  Exp«rta»  Low  ExportiM 

N««ci«d  N00d»d 


Figure  1.  Spectrum  of  knowledge-based  tools  in  the  knowledge  life  cycle. 

Figure  1  shows  how  knowledge-based  design  tools  fit  into  the  spectrum  from 
creative  design  to  routine  parametric  design.  The  horizontal  axis  denotes  the  level  of 
human  expertise  required  to  use  the  tool,  while  the  diagonal  line  separates  specification 
tools  from  program  derivation  tools.  The  figure  illustrates  that  knowledge-based  tools  can 
be  used  much  earlier  in  the  knowledge  life  cycle  than  current  CASE  tools.  Domain 
modeling  tools  use  a  formal  modeling  language  to  exfffess  knowledge  about  an  applica¬ 
tion  domain.  This  knowledge  can  be  used  for  different  operational  goals  throughout  the 
software  life  cycle,  from  requirements  engineering  to  re-engineering.  Domain  modeling 
is  the  first  step  in  moving  from  creative  design  to  routine  design.  Because  domain 
modeling  is  essentially  the  formalization  of  domain  knowledge  it  requires  a  high  degree 
of  expertise,  both  in  the  application  domain  and  in  knowledge  representation  formalisms. 
During  the  innovative  phase  of  the  knowledge  life  cycle,  general  purpose  interactive 
program  synthesis  systems  could  be  used  to  explore  the  solution  space  for  an  application 
domain.  Many  of  these  general  purpose  systems  have  facilities  for  recording,  editing,  and 
replaying  derivation  histories.  These  replay  facilities  might  enable  users  with  less 
expertise  than  the  original  designer  to  develop  derivations  for  similar  specifications  [20] . 
Given  domain  knowledge  from  a  domain  modeling  tool,  formal  specifications  can  be 
developed  and  incrementally  modified  with  tools  such  as  ARIES  [9]. 

Figure  1  also  illustrates  that  knowledge-based  tools  can  be  employed  by  users  with 
lower  levels  of  expertise  than  required  for  current  CASE  tools.  Front  end  CASE  tools 
enable  designers  to  define  and  edit  software  abstractions  like  data  flow  diagrams  during 


231 


the  initial  stages  of  system  design.  However,  because  these  CASE  tools  lack  application 
domain  knowledge  they  require  more  expertise  and  [wovide  lower  levels  of  verification 
and  simulation  capabilities  than  can  be  provided  with  knowledge-based  tools.  A  good 
example  of  a  knowledge-based  specification  toot  is  the  WATSON  system  (11],  which 
interactively  elicits  and  validates  requirements  for  new  telephone  features  (such  as  call 
waiting)  from  telephony  engineers  using  domain  level  scenarios.  Back  end  CASE  tools 
such  as  application  generators  (e  g.,  parser  generators)  are  currently  widely  used  for  the 
final  stages  of  coding,  particularly  in  commercial  data  processing.  They  consist  of  a  menu 
driven  or  application  language  front  end  and  a  template-driven  code  generator  back  end. 
As  such  they  are  suitable  for  routine  parametric  design.  In  contrast,  domain-specific 
program  synthesis  systems  also  can  be  used  fw  routine  configuratiomil  design  and  have 
a  high  level  user  interface  that  reduces  the  expertise  needed  by  an  end-user.  Because 
systems  like  ELF  [29]  and  SYNAPSE  [  10]  combine  template  driven  code  generation  with 
more  powerful  AI  semantic  processing  techniques  and  transformation  rules,  they  can 
tackle  routine  configurational  design  in  addition  to  parametric  design.  They  also  produce 
more  optimal  code  than  an  application  generator. 


Creativa 


Innovativa 


Routina 

Configurational 


Routina 

Paramatric 


High  Expartsa 
Naadad 


Low  Expartisa 
Naadad 


Figure  2.  Transfer  and  reuse  of  expertise  in  the  KBSE  paradigm. 

Knowledge-based  technology,  by  providing  an  active  medium  for  communication 
of  knowledge,  can  also  potentially  compress  the  knowledge  life  cycle.  In  the  absence  of 
major  advances  in  machine  discovery,  the  development  of  design  knowledge  will 
continue  to  be  a  human-intensive  process  requiring  high  levels  of  expertise.  However, 
knowledge-based  tools  could  assist  human  research  scientists  and  engineers  in  the 
development  of  this  knowledge.  These  tools  could  then  compile  and  transfer  this 


232 


knowledge  into  shells  that  do  interactive  requirements  elicitali on.  redesign,  specification 
acquisition,  and  domain-specific  program  synthesis. 

Figure  2  illustrates  tuts  uansfer  of  expertise  using  knowledge-based  subsystems;  a 
more  detailed  exposition  can  be  found  in  [16].  The  domain  modeling  assistant  and 
program  design  assistant  would  be  used  by  scientists  and  engineers  during  the  creative 
and  innovative  stages  of  the  knowledge  life  cycle.  The  specification  acquisition  shell 
would  support  future  system  analysts  in  developing  formal  system  specifications,  while 
the  redesign  shell  would  enable  system  developers  to  construct  sof i  ware  systems  rapidly 
by  editing  and  replaying  derivation  histories  developed  with  a  program  design  assistant. 

For  end-users  with  low  expertise,  requirements  elicitation  shells  would  use  domain 
knowledge  to  develop  interactively  formal  specifications  of  their  requirements  using 
informal  examples.  Domain-specific  program  synthesis  shells  would  then  synthesize  a 
system  meeting  these  requirements.  Note  that  while  asystem  developer  with  intermediate 
expertise  could  be  expected  to  understand  and  manipulate  derivation  histories  to  develop 
a  software  system,  an  end  user  with  low  expertise  would  require  totally  automatic 
program  synthesis.  Thus  a  domain  specific  synthesis  shell  would  require  more  highly 
com  piled  control  knowledge  forcontrolling  software  derivation  than  aredesign  shell,  i.e., 
tactics  or  large-grained  compiled  transformations  as  opposed  to  interpreting  and  miunipu- 
latiag  derivation  histories. 

4  Conciusion 

Knowledge  based  software  engineering  is  based  on  research  spanning  over  two  decades. 
Significant  commercial  applications  are  likely  within  this  next  decade,  particularly  as 
industrial  pilot  projects  in  domain  specific  program  synthesis  and  re-engineering  mature. 
Interest  in  formal  methods  will  also  spur  development  of  the  field.  However,  to  achieve 
the  full  paradigm  requires  many  technical  advances  in  knowledge  representation  and 
automated  reasoning.  This  paper  has  described  various  methodologies  for  making 
automated  reasoning  tractable  fw  program  synthesis.  Further  improvements  are  likely  to 
require  that  knowledge  representations  for  program  design  expertise  be  developed  in 
tandem  with  automated  reasoning  methods. 

The  paradigm  of  the  knowledge  life  cycle  can  help  to  clarify  the  role  of  knowledge- 
based  software  engineering  tools  and  guide  their  development.  Different  kinds  of 
knowledge-based  tools  are  appropriate  at  different  stages  of  the  knowledge  life  cycle. 
Furthermore,  knowledge-based  tools  can  expedite  the  transfer  of  expertise  from  research 
scientists  and  engineers  and  thus  compress  the  knowledge  life  cycle. 

References 

1.  B.W.  Boehm:  Software  Engineering  Economics.  Englewood;  Prentice  Hall  198 1 

2.  M.  Broy,  P.  Pepper.  Program  Development  as  a  Formal  Activity.  IEEE  Trans,  on 
Software  Eng.  7(1),  14-22  (1981) 

3.  C,L.  Chang,  R.C.T.  Lee;  Symbolic  Logic  and  Mechanical  Theorem  Proving.  New 
York:  Academic  Press  1973 

4.  J.  Darlington:  An  Experimental  Program  Transformation  and  Synthesis  System. 
Artificial  Intelligence  16, 146(1981) 


233 


5.  R.  Floyd:  Toward  Interactive  Design  of  Correct  Programs.  In  C.  Rich,  R.C.  Waters 
(eds.):  Artificial  Intelligence  and  Software  Engineering.  Los  Altos;  Morgan 
Kaufmaiin  l9So,  pp.  331-334 

6.  C.  Green;  Application  of  Theorem  Proving  to  Problem  Solving.  UCAl  ( 1969) 

8.  C.  Green,  D.  Luckham,  R.  Balzer,  T,  Cheatham,  C.  Rich:  Report  on  a 
Knowledge-Based  Software  Assistant.  In  C.  Rich,  R.C.  Waters  (eds.):  Artificial 
Intelligence  and  Software  Engineering.  Los  Altos;  Morgan  Kaufmann  1986.  pp. 
337428 

9.  W.L.  Johnson,  M.S.  Feather:  Using  Evolution  Transformations  to  Construct 
Specifications.  In  M.R.  Lowry.  R.D.  McCartney  (eds.);  Automating  Software 
Design.  Cambride.  MA;  AAAI/MIT  Press  1991.  pp.  65-92 

10.  E.  Kant.F.  Daube,  W.  MacGregor,  J.  Wald:  Scientific  Programming  by  Automated 
Synthesis.  In  M.R,  Lowry,  R.D.  McCartney  (eds.):  Automating  Software  Design. 
Cambride.  MA;  AAAI/MIT  Press  1991.  pp.  169-206 

1 1 .  V.E.  Kelly.  U.  Nonnenmann:  Reducing  the  Complexity  of  Formal  Specification 
Acquisition.  In  M.R.  Lowry.  R.D.  McCartney  (eds.):  Automating  Software 
Design.  Cambride,  MA:  AAAI/MIT  Press  199 1 .  pp.  4 1  -64 

12.  D.E.  Knuth,  P.B.  Bendix:  Simple  Wwd  Problems  in  Universal  Algebras.  In  J. 
Leach  (ed.);  Computational  Problems  in  Abstract  Algebra.  Pergammon  Press 
1970,  pp.  263-298 

13.  J.  Komorowski:  Synthesis  of  Programs  in  the  Partial  Deduction  Framework.  In 
M.R.  Lowry,  R.D.  McCartney  (eds.):  Automating  Software  Design.  Cambride, 
MA;  AAAI/MIT  Press  1991,  pp.  377-404 

14.  R.I.  Kuper;  Dependency-directed  localization  of  software  bugs.  Tech.  Report 
1053  MIT  AI  Ub,  1989 

15.  M.R.  Lowry:  Automating  the  Design  of  Local  Search  Algorithms.  In  M.R.  Lowry, 
R.D.  McCartney  (eds.);  Automating  Software  Design.  Cambride,  MA:  AAAI/ 
MIT  Press  1991,  pp.  515-546 

16.  M.R.  Lowry:  Software  Engineering  in  the  Twenty-First  Century.  In  M.R.  Lowry, 
R.D.  McCartney  (eds.):  Automating  Software  Design.  Cambride,  MA:  AAAI/ 
MIT  Press  1991,  pp.  627-654 

17.  M.R.  Lowry,  R.  Duran;  Knowledge-based  Software  Engineering.  In  A.  Barr.  P.R. 
Cohen.  E.A.  Feigenbaum  (eds.);  The  Handbook  of  Artificial  Intelligence,  Vol.  4. 
Reading,  MA:  Addison-Weseley,  1989 

18.  M.R.  Lowry,  R.D.  McCartney  (eds.);  Automating  Software  Design.  Cambride, 
MA;  AAAI/MIT  Press  1991 

19.  Z.  Manna,  R.  Waldingcr;  A  Deductive  Approach  to  Program  Synthesis.  ACM 
Trans,  on  Prog.  Lang,  and  Sys.  2(1):  90-121  (1980) 

20.  J.  Mostow:  Design  by  Derivational  Analogy:  Issues  in  the  Automated  Replay  of 
Design  Plans.  Artificial  Intelligence  40. 1 19-184  (1989) 


234 


21  R.  Paige.  S.  Koenig;  Finite  Differencing  of  Computable  Expressions.  ACM 
Transactions  on  Programming  Languages  and  Systems  4(3);  402-454  (1982) 

22.  C.H.  Papidimitriou,  K.  Steiglitz:  Combinatorial  Optimization.  Englewood.  N.J.. 
Prentice-Hall  1982 

23.  U.S.  Reddy;  Design  Principles  for  an  Interactive  Program-Derivation  System.  In 
M.R.  Lowry,  R.D.  McCartney  (eds.);  Automating  Software  Design.  Cambride. 
MA;  AAAI/MIT  Press  1991.  pp.  453-482 

24.  C.  Rich.  R.C.  Waters  (eds.);  Artificial  Intelligence  and  Software  Engineering.  Los 
Altos;  Morgan  Kaufmann  1986 

25.  C.  Rich:  A  formal  representation  for  plans  in  the  Programmer  s  Apprentice.  ICJ  AI 
1981 

26.  C.  Rich,  Y.  A.  Feldman;  Seven  Layersof  Knowledge  Representation  and  Reasoning 
in  Support  of  Software  Development,  IEEE  Trans,  on  Software  Eng.  18(6).  45 1  - 
469(1992) 

27.  J.A.  Robinson;  A  Machine-Oriented  Logic  Based  on  the  Resolution  Principle.  J. 
ACM  12(1),  23-41  (1965) 

28.  H.B.  Rubenstein,  R.C.  Waters:  The  Requirements  Apprentice:  Automated 
assistance  for  requirements  acquisition.  IEEE  Trans,  on  Software  Eng.  17, 226- 
240(1991) 

29.  D.  Setliff:  Synthesizing  VLSI  Routing  Software  from  Specifications.  In  M.R. 
Lowry,  R.D.  McCartney  (eds.):  Automating  Software  Design.  Cambride,  MA; 
AAAI/MIT  Press  1991,  pp.  207-226 

30.  D.R.  Smith,  M.R.  Lowry;  Algorithm  "nieories  and  Design  Tactics.  Science  of 
Computer  Programming  14:305-321  (1990) 

3 1 .  D.R.  Smith:  KIDS  -A  Knowledge-Based  Software  Development  System.  In  M.R. 
Lowry,  R.D.  McCartney  (eds.);  Automating  Software  Design.  Cambride,  MA: 
AAAI/MIT  Press  1991,  pp.  483-514 

32.  D.R.  Smith;  Constructing  Specification  Morphisms.  Kestrel  Institute  Technical 
Report  KES.U.92.1  (1992) 

33.  RJ.  Waldinger,  R.C.  Lee:  PROW:  A  step  toward  automatic  program  writing. 
UCAI-69,  pp.  241-252 

34.  R.C.  Waters:  Program  Translation  via  Abstraction  and  Reimplementation.  IEEE 
Trans,  on  Software  Eng.  14(8),  1207-1228  (1988) 

35.  D.S  Wile;  Program  Developments:  Formal  explanations  of  implementations. 
CACM  26(11):  902-911  (1983) 


UPDATING  LOGIC  PROGRAMS 


Nicola  Leone,  Luigi  Palopoli*  and  Massimo  Romeo 
DEIS,  Universiia'  della  Calabria,  87036  Rende  (CS),  Italy. 


Abstract.  This  paper  proposes  an  update  language  for  logic  programming 
based  knowledge  systems.  The  language  is  built  upon  two  basic  update 
operators  denoting  insertions  and  deletions  of  positive  literals  (atoms),  respec¬ 
tively.  Several  simple  control  structures  have  been  defined  by  which  basic 
updates  can  be  combined  to  program  complex  updates.  The  presented 
approach  is  centered  around  the  idea  of  executing  a  basic  update  operation  by 
directly  modifying  the  truth  valuation  of  the  (intensionally  or  extensionally 
defined)  atom  on  which  it  is  requested.  Also  the  truth  valuations  of  the  atoms 
inductively  dcpciiuing  on  the  updated  one  are  accordingly  modified.  Several 
examples  are  presented  which  show  that  both  deterministic  and  non- 
deterministic  transformations  of  a  logic  program  are  easily  expressed  within 
the  update  language. 


1  Introduction 

Logic  programming  is  today  a  well  assessed  tool  for  various  kinds  of  advanced 
application  domains  of  knowledge  engineering  including,  for  instance,  intelligent 
database  interfaces  and  expert  systems.  Well  defined  and  efficient  supports  for 
query  answering  is  surely  the  most  important  feature  of  a  logic  programming  based 
knowledge  system.  However  it  is  widely  recognized  that,  in  many  applications,  an 
adequate  support  to  updating  knowledge  formalized  in  the  logic  program  is  needed 
[1,  18]. 

Roughly  speaking,  the  problem  of  updating  a  logic  program  can  be  described  as 
follows.  Given  a  logic  program  P  and  an  atom  A  to  be  inserted  (resp.,  deleted), 
the  task  of  updating  is  to  modify  the  semantics  of  P  in  such  a  way  that  (1)  the 
atom  A  is  true  (resp.,  false)  in  the  modified  semantics,  and  (2)  the  semantics  is 
modified  "consistently"  with  the  rules  in  P .  The  latter  point  requires  that  a  suitable 
form  of  "closure"  is  maintained  within  the  modified  semantics  w.r.t.  the  inferences 
denoted  by  rules  in  P .  The  closure  property  we  shall  adopt  is  a  very  simple  and 
intuitive  one,  and  will  be  explained  in  the  following. 

In  the  recent  years  great  attention  has  been  paid  to  the  problem  of  logic  program 
updating  (e.g.,  (7,  12,  13]).  As  a  matter  of  fact,  however,  no  solution  to  the  prob¬ 
lem  of  up^ting  intensional  knowledge  can  be  found  in  the  literature  which  can  be 
regarded  as  being  satisfactory  in  all  applicative  frameworks. 

The  first  attempt  to  approach  the  problem  of  updating  logic  programs  was  made 
within  the  logic  language  Prolog  [15].  Indeed,  Prolog  includes  two  built-in  update 
operators,  namely,  assert  and  retract.  However,  these  update  operators  have  many 
semantic  and  operational  faults,  as  lucidly  pointed  out  by  D.S.  Warren  in  [17]. 
One  major  fault  of  these  operators  is  the  lack  of  a  clear  formal  semantics. 

To  overcome  these  drawbacks  several  alternative  proposals  have  been  developed 
[3,  5,  6,  7,  10,  13,  14].  Most  of  them  approach  the  problem  of  updating  a  logic 
program  as  a  generalization  of  the  view  update  problem  in  the  relational  database 
framework  (e.g.,  [4]).  These  solutions,  generally  cleanly  formalized,  are  based  on 
"pushing-down"  updates  defined  on  intentional  predicates  towards  base  ones, 
bi^ause  rules  are  regarded  as  certain,  not  updatable  knowledge.  However,  even  if 
there  is  strong  evidence  towards  adopting  the  "push-down"  philosophy  in  many 


'  Current  affiliation  is  CS  Department,  UCLA,  Los  Angeles  (CA),  USA.  Email:  luigi@cs.ucla.edu. 


236 


situations,  there  are  still  a  number  of  applications  for  which  it  is  not  well  suited. 
A  typical  case  in  which  a  "push-down"  semantics  fails  in  providing  suitable  mean¬ 
ing  to  update  activities  arises  when  some  general  rule  holds  at  a  given  time,  but 
some  of  its  specific  instances  is  to  be  invali^ted  afterwards. 

As  an  example,  consider  a  knowledge  base  describing  some  aspects  of  the  life  in 
a  zoo.  A  portion  of  the  logic  program  constituting  the  knowledge  ba.se  's  shown 
next: 

r  1 :  eats  (X  ,banana  )<— monkey  (X )  bear  (yogi  )<— 

^2:  eats  (X , honey  )i~bear(X)  bear  (bubu)<r- 

r  3;  eats  (X  yalmon  )<r-bear  (X )  monkey  (kong 

r^.  happy  (X  )<r-bear  (X  ),eats  (X  ,honey  ),eats  (X  yalmon ) 
r^:  slimming  (X  )i-bear  (X  ),--ieats  (X  Money ) 

The  above  logic  program  describes  the  classification  in  species  of  the  animals 
in  the  zoo,  the  foods  each  species  is  usual'y  fed  with  and  some  further  relation¬ 
ships  between  species,  diet,  happiness  and  weight  n.odification  of  an  individual 
animal.  For  instance,  yogi  is  a  bear  and  so  he  usually  eats  honey  and  salmon. 
Furthermore  yogi  is  happy  and  he  is  not  slimming.  An  exception  to  the  ordinary 
diet  occurs  when,  at  some  time,  yogi  the  bear  gets  sick  because  he  ha*,  eaten  too 
much  honey.  In  the  new  situation  yogi  is  disallowed  eating  further  honey.  So  we 
want  a  new  semantics  for  the  logic  program  in  which  the  information  that  yogi  can 
cat  lioney  is  no  longer  valid.  In  other  words  the  atom  eats  (yogi  Money)  should  not 
be  part  of  the  intended  semantics  of  the  logic  program  any  longer,  so  that  yogi  is 
prevented  eating  further  honey.  Thus,  from  a  semantic  viewpoint,  the  expected 
result  should  be  to  make  false  the  atom  eats  (yogi  Money ).  Moreover,  some  form  of 
"consistency"  w.r.t.  the  logic  program  is  to  be  provided,  as  atoms  which  arc  "no 
longer  supported"  by  any  ground  instance  rule  are  to  be  made  false  whereas  aicms 
which  are  supported  by  some  rule,  and  arc  not  true  in  the  current  semantics,  arc  to 
be  made  true.  In  the  zoo  example  at  hand,  the  "consistency"  principle  implies  that 
the  atom  slimming  (yogi)  is  made  true  whereas  the  atom  happy  (yogi)  is  made 
ialse,  and  the  truth  valuations  of  all  other  atoms  remain  unchanged. 

Following  a  "pushing-down"  approach  the  effect  of  the  update  is  to  delete 
bear  (yogi),  which  is  clearly  unintuitive  as,  even  if  sick,  yogi  is  still  a  bear. 

One  could  object  that  the  expected  result  can  be  obtained,  even  if  a  "pu.sh-down" 
approach  is  assumed,  by  refining  r2  into: 

eats  (X  Money  )>^bear  (X  ),—ate_ioo_muchJioney  (X ) 

and  then  adding  the  fact  ate  toojnuch  honey (yogi)  when  needed.  However,  we 
note  that,  in  a  situation  like  tfic  one  we  .u'c  describing  here,  we  can  have  hundreds 
of  causes  dcterminiiig  hundreds  of  changing  in  the  ordinary  diet  and  to  explicitly 
encode  all  the  possibilities  in  the  rules  is  obviously  not  convenient.  It  is  also 
worth  pointing  out  that  the  expected  result  can  not  be  obtained  by  means  of  the 
Prolog  retract  operator,  as  it  allows  only  to  completely  eliminate  the  rule  r2,  hen¬ 
ceforth  enforcing  all  bcais  not  to  cat  honey. 

The  Update  Logic  Language  (ULL)  presented  in  this  paper  is  based  on  the  idea 
that  updates  requested  on  an  intcnsional  predicate  q  have  to  be  carried  out  by 
directly  modifying  the  set  of  tuples  on  which  q  holds,  rather  than  changing  the 
extensions  of  base  predicates  on  which  q  depends  (in  other  words,  no  pushing 
down  is  performed).  In  order  to  maintain  the  consistency  w.r.t.  the  implications 
defined  in  the  program,  along  with  the  directly  involved  atoms,  the  truth  values  of 
atom;s  which  are  logical  consequences  of  them  have  to  be  jxissibly  modified  as 
well  (the  atoms  happy  (yogi)  and  slimming  (yogi )  in  the  example  at  hand)  .  in  fact, 
the  expected  semantics  of  (he  example  at  hand  is  obtain.-d  by  executing  the  ULL 
delete  operation  -eats  (yogi  ,honey ). 

ULL  includes  two  basic  update  operators  which  arc  used  to  specify  deletions 
(like  the  previous  one)  and  insertion  of  atoms,  respectively.  Basic  updates  arc  not 


237 


expressive  enough  on  their  own,  and  so  they  are  combined  in  ULL  to  form  more 
complex  updates  as  follows.  First,  several  basic  update  operations  can  be 
sequenced  by  listing  them  one  after  another  in  the  order  in  which  they  arc  intended 
to  be  executed.  Second,  it  is  possible  to  specify  two  conditions,  each  consisting  of 
a  logical  goal  (i.e.,  a  possibly  negated  conjunction  of  literals),  which  influence  the 
execution  of  an  u^ate.  The  former  condition  (precondition)  is  evaluated  against 
the  program  semantics  on  which  the  update  is  requested;  while  the  latter  one 
(postcondition)  is  evaluated  against  the  program  semantics  which  would  result 
from  the  application  of  the  requested  update  operation.  If  both  the  conditions  are 
satisfied  then  the  update  becomes  persistent  (i.e.,  the  knowledge  is  modified 
according  to  the  requested  update  operations).  Otherwise  the  update  is  aborted  and 
the  logic  program  remains  unchanged.  Either  conditions  may  be  empty,  in  which 
case  the  trivial  condition  True  is  assumed.  A  further  role  played  by  pre-  and  post¬ 
conditions  is  to  construct  substitutions  which  are  used  to  execute  an  update  (see 
below).  We  notice  that  our  notion  of  pre-  post-  conditions  is  a  simplified  form  of 
the  state-testing  conditions  of  dyntunic  logics  (8,  9].  Third,  the  programmer  is 
given  the  possibility  of  choosing  one  out  of  three  execution  modalities  under  which 
an  update  is  to  be  executed.  In  more  details,  the  possible  forms  of  an  ULL  update 
are  the  following: 

a.  3  (C I  U  C 2),  which  denotes  an  update  with  an  existential  execution  modal¬ 
ity  (existential  update,  for  short); 

b.  '^(Ci  U  C2),  which  denotes  an  update  with  a  universal  execution  modality 
(universal  update,  for  short); 

c.  *(Ci  U  Cj),  which  denotes  an  update  with  an  iterated  execution  modality 
(iterated  update,  for  short). 

where  Cj  and  C2  are  (possibly  empty)  goals,  and  U  is  an  update  operation  of  the 
form  U1U2  •  ■  ■  “n.  being  Ui  a  basic  update,  l<i<n^ 

After  its  name,  the  execution  modality  determines  the  way  an  update  is  executed. 
Intuitively,  an  existential  update  B  (C,  f/  C2)  is  executed  by  running  Ua  on  the 
logic  program  at  hand,  where  a  is  a  substitution  chosen  non-deterministically 
amongst  those  making  true  the  condition  C\  against  the  initial  semantics  and  C2 
against  tfie  semantics  obtained  executing  Ua.  If  no  such  substitution  exists,  then 
the  existential  update  has  a  null  effect.  Consider  universal  update  V  (C 1  (/  C  2)  and 
let  Z  be  the  set  of  substitutions  making  true  C 1  with  respject  to  the  initial  seman¬ 
tics.  The  universal  update  has  a  non  null  effect  only  if  for  each  oe  Z,  C  2CT  is  true 
with  resp)ect  to  the  updated  semantics  obtained  by  executing  t<iZ...«,Z,  where 
U  =  U]...Un,  in  which  case  this  updated  semantics  is  the  result  of  the  update,  (note 
that  i<iZ  denotes  the  execution  of  all  the  opierations  M,a  such  that  aeZ).  An 
iterated  update  *(Ci  U  C2)  is  executed  by  iteratively  executing  the  existential 
update  ^  (Cj  [/  C2)  until  the  semantics  resulting  from  two  successive  application 
of  this  existential  update  yield  an  identical  result,  (or  both)  is  no  longer  satisfiable 
against  the  respjective  logic  program. 

Finally,  given  k  updates  it  is  pxKsible  to  specify  the  execution  of  them 

in  sequence.  To  do  this,  ULL  includes  a  sequential  composition  operator,  denoted 
by 

The  rest  of  the  papor  is  organized  as  follows.  Section  2  contains  a  numl>cr  of 
examples  of  simple  update  problems  and  their  solutions  with  ULL  which  should 


^  Note  that  the  correct  ULL  syntax  for  the  previous  update  -eats (yogi .honey)  would  have  been  ei¬ 
ther  E3  (-eats (yogi .honey))  or  (-eats (yogi .honey)),  or  3  (-eats (yogi, honey)).  However,  as  shall  he 
apparent,  the  semantics  of  variable-free  updates  is  independent  on  the  execution  modality.  Therefore,  in 
this  case,  w.l.o.g.,  we  allow  dropping  the  modality  specifier. 


238 


help  in  informally  illustrating  the  main  characteristics  of  our  approach.  Then,  Sec¬ 
tion  3  provides  the  formal  definition  of  the  update  language  ULL . 

2  Sample  Updates 

In  this  section  we  shall  present  a  number  of  examples  which  should  shed  some 
light  on  the  structure  of  the  ULL  language.  Some  of  the  presented  problems  refer 
to  updates  requested  on  extensionai  predicates.  However,  we  recall  that  our 
approach  is  based  on  the  idea  of  making  no  difference  between  updating  cxten- 
sional  predicates  and  updating  intensional  ones.  Therefore,  in  our  framework,  the 
solutions  to  those  problems,  illustrated  in  the  following,  are  completely  "general". 
Before  facing  more  involved  update  problems,  let  us  give  one  more  example  of  a 
very  simple  update. 

Assume  that,  because  of  his  disease,  yogi  the  bear  has  temporarily  to  eat 
oranges.  Then  the  logic  program  of  the  example  can  be  update  using 
+eats (yogi , orange).  When  yogi  eventually  recovers  his  health,  his  diet  can  be 
reset  to  the  ordinary  one.  This  can  be  accomplished  by 

-eats  (yogi  ,orange  )  +eats  (yogi  Money  ). 

Such  an  update,  besides  the  two  directly  requested  diet  modifications  (i.e.,  yogi 
does  not  eat  oranges  and  he  eats  honey),  also  causes  modifications  of  the  semantics 
of  the  predicates  depending  on  eats.  In  particular,  yogi  is  again  happy  and  not 
slimming.  Hence,  the  original  semantics  of  P^oo  is  completely  restored.  We  note 
that  the  previous  update  consists  of  a  sequence  of  two  operations,  which  are  exe¬ 
cuted  one  after  another  from  left  to  right  in  the  order  in  which  they  app^. 

As  already  stated,  the  ULL  language  allows  to  use  different  execution  modali¬ 
ties  for  updates.  Let  us  start  with  the  V  modality.  For  an  example  of  it,  assume 
that  the  zoo  has  several  employees  some  of  which  look  after  the  animals.  Each 
employee  is  characterized  by  his  name,  his  seniority,  expressed  in  years,  and  his 
monthly  salary,  expressed  in  dollars.  Furthermore,  employees  looking  after  bears 
receive  an  extra  salary  of  100  dollars  per  month.  So,  assume  the  following  facts 
belong  to  the  logic  program. 

employee  (tom ,  1 ,1500)<-  lookjif  ter  (tom  ,kong  )<r- 

employee  (John  ,2,2000)f-  look_af  ter  (John  yogi  )<- 

employee  (bob  ,3,2500)<-  look  a f  ter  (bob  ,bubu  )<- 

extra j>ay(X,  \Q0)*r-bear(Y),  look_after(X ,  Y) 

To  refer  to  a  classical  database  example,  assume  that  the  salary  of  all  the  employ¬ 
ees  having  been  working  at  the  zoo  for  at  least  two  years  is  to  be  raised  by  five 
percent.  Then,  a  set  oriented  update  of  the  logic  program  can  be  executed,  which  is 
expressed  as  follows  in  ULL : 

's/ (employee  (X ,Z,Y),  Z>2  -employee  (X ,Z,Y)  +employee (X  ,Z ,Y x] .05)) 

Intuitively,  the  above  update  works  as  follows:  first,  the  set  L  of  the  substitutions 
satisfying  the  conjunction  employee  (X,  Z,  Y),  Z^  is  computed  (i.e.,  the  set  of  all 
employees  with  a  working  seniority  greater  than  two  years  is  determined);  second, 
for  each  substitution  in  Z,  the  update  operation  -employee  (X,  Z,  Y)a  is  executed 
(i.e.,  the  old  employee’s  tuple  is  removed);  finally,  for  each  cr  in  Z,  the  update 
operation  ^employee (X ,  Z ,  yxl.05)o  is  executed  (i.e.,  the  new  employee’s  tuple, 
including  his  raised  salary,  is  inserted).  Since  no  postcondition  is  present,  the 
update  succeeds  and  the  modification  becomes  persistent.  In  order  to  show  the  u‘'c 
of  postconditions,  assume  that  the  increase  of  the  employees’  salaries  has  to  ix; 
executed  only  if  it  does  not  cause  that  an  employee  with  two  years  of  seniority 
earns  more  than  2100  dollars  for  his  base  salary.  Such  a  constraint  on  the  execu¬ 
tion  of  the  update  can  be  imposed  by  simply  adding  the  postcondition 
-^(employee  (X  ,2J),  r>21(X))  to  the  previous  update; 


239 


V (  employee (X,Z,Y),Z>2  -employee (X ,Z,  Y) 

+employee(X,  Z,  Kxl.05)  -^(employee (X  ,2,T),  7’ >2100)) 

Thus,  in  the  case  of  a  universal  updates,  the  precondition  determines  the  set  of 
substitution  on  which  the  update  operation  is  to  be  executed  (the  employees  whose 
salary  has  to  be  raised,  in  the  example  at  hand),  whereas  the  postcondition  deter¬ 
mines  whether  the  update  has  to  be  actually  executed  or  not  (it  is  a  sort  of  con¬ 
straint  on  the  whole  execution  of  the  update).  By  the  way,  notice  how  naturally 
and  compactly  ULL  allows  to  express  the  above  updates. 

As  dready  mentioned,  ULL  allows  to  specify  iteration  of  tuple-oriented 
updates.  Iteration  is  necessary  in  order  to  endow  the  update  language  with  a  good 
expressive  power  -  we  notice  that  we  do  not  have  recursion  as,  for  instance  the 
top-down  language  in  [12],  or  the  bottom-up  language  in  [2].  For  an  example  of 
an  iterated  update,  assume  that  we  want  to  Icvel-up  the  s^ary  of  employees  with 
the  same  seniority.  This  is  accomplished  by  the  following  update: 

*  (employee  (X,Y,Z),  employee  {X\Y  ,Z'),X*X'  ,Z  <Z\ 
-employee  (X ,  Y,  Z),  +employee  (X ,  T ,  Z' )) 

where  the  single  leveling-up  step  -employee{X ,  Y  ,Z), +employee{X ,  Y  ,Z')  is 
iterated  while  there  exists  a  pair  of  employees  with  the  same  seniority  Y  but 
different  salaries.  It  should  be  clear  that  the  execution  of  an  iterated  update  can  be 
non-terminating.  However,  this  is  the  same  situation  one  has  to  hano'e  using  while 
loops  in  conventional  programming  languages.  In  any  case,  some  time-out  mechan¬ 
ism  could  be  provided  if  necessary. 

For  other  instances  of  ULL  updates  let  us  switch  away  from  the  zoo  example 
and  consider  the  following  logic  program  P graph  defining  the  direct  graph  Gj 
(shown  in  Figure  2.1)  along  with  its  transitive  closure. 

arc(c,a)*-  arc(a,b)<- 

ri  ic(X  ,Y)^arc(X  ,Y)  arc{b,a)<— 

r 2;  tc  {X  ,Y)i-arc  (X  ,Z ),rc  (Z ,Y )  arc  {b,c)i- 


a  ^ba  ^b  _ a’  a, 


c 

Fig.  2.1 


c 

Fig.  2.2 


c 

Fig.  23 


Assume  we  want  to  create  a  "duplicate"  of  the  node  a  in  the  graph,  i.e.,  a  node  a ' 
which  has  the  same  sets  of  incoming  and  outgoing  arcs  as  a .  This  is  obtained  by: 

V  (arc  (a  ,X  )+arc  (a  ));  V  (arc  (X  ^  )+arc  (X  ja  0) 

Note  that  the  above  update  consists  of  two  updates,  V  (arc  (a  ,X  )+arc  (a  'X ))  and 
'r1(arc(X ja)+arc(X  combined  through  a  sequence  control  structure,  denoted 
in  ULL  by  The  resulting  graph  is  shown  in  Figure  2.2.  It  is  worth  pointing 
out  once  again  that,  according  to  ULL  semantics  (see  Section  3),  the  truth  valua¬ 
tion  associated  with  the  intensional  predicate  tc  is  consequently  modified.  For 
instance,  the  atom  tc  (a  ',c )  becomes  true  after  update  execution. 

An  "orientation"  of  a  graph  G  is  a  maximal  subgraph  of  G  containing  no  pairs 
of  symmetric  arcs.  The  following  ULL  update  orients  the  graph  G  j . 

♦  (arc (X,Y),arc(Y,X)  -arc (X .  T)) 

The  previous  update  is  non-deterministic.  One  of  its  results  is  shown  in  Figure 
2.3. 

We  remark  the  importance  of  non-determinism  encoded  in  the  language.  Indeed 


240 


an  orientation  of  the  graph  cannot  be  obtained  by  a  deterministic  language  not 
allowing  "pick-up-one"  choices  of  indistinguishable  data  (2], 

3  The  Update  Logic  Language 

The  section  makes  use  of  some  basic  notions  concerning  logic  programming  and 
well-founded  semantics.  For  room’s  reasons  such  notions  are  not  reported  here. 
The  reader  may  refer  to  [1 1,  16)  for  a  detailed  presentation  of  these  topics. 

Given  a  logic  program  P ,  we  assume  the  well-founded  model  of  P,  denoted 
WFiP),  as  the  intended  meaning  of  a  logic  program  P.  Thus,  given  a  ground 
positive  goal  G  (resp.,  negative  goal  -.(G)),  we  say  that  P  implies  G  (resp., 

P  implies  -.(G)),  denoted  P|-G  (resp..  P\- - .(G)),  if  and  only  if  G  is  true 

(resp.,  G  is  false)  with  respect  to  WF(P). 

The  sequel  of  this  section  formally  presents  both  the  syntax  and  the  semantics  of 
ULL. 

3.1  Syntax 

The  primitive  manipulations  definable  on  a  logic  program  are  insertions  and  dele¬ 
tions  of  atoms.  Insertions  and  deletions  are  denoted  by  update  operators  -h  and  -, 
respectively.  Thus,  a  simple  update  operation  is  a  syntactic  of  the  form  ®Q, 
where  0e  {-t-,  -)  is  an  update  operator,  and  Q  is  a  an  atom.  If  Q  is  ground  then 
0Q  is  a  ground  simple  update  operation. 

In  particular,  +Q  is  called  insertion ,  while  -Q  is  called  deletion . 

Simple  up^tes  are  combined  in  sequences  to  express  update  operations  involv¬ 
ing  more  atoms.  Let  {ui....,tt„l  denote  a  group  of  simple  updates.  Then  is 

an  update  operation.  If  {ui,...,«,  J  are  all  ground  then  is  a  ground  update 

operation.  Intuitively,  the  execution  of  the  update  operation  u\...u„  corresponds  to 
execute  the  simple  up^e  operation  U\  first;  then,  M2  is  fired  and  so  on,  until  to  the 
execution  of  the  simple  update  . 

The  above  defin^  update  operations  are  the  basic  primitives  for  modifying  the 
meaning  of  a  program.  However,  many  real  world  situations  require  that  an 
update  is  executed  only  if  some  conditions  are  verified.  The  general  updates  that 
we  define  next  allows  us  to  deal  with  such  situations  by  specifying  a  precondition 
and  a  postcondition  to  the  update  execution.  In  order  to  endow  our  language  with 
a  good  flexibility  and  expressibility,  we  introduce  three  different  types  of  updates 
along  with  a  composition  operator  which  allows  to  execute  updates  in  sequence. 

Definition.  Let  Cj,  C2  denote  two  goals,  and  G  be  an  update  operation.  Then 

1.  3(Ci  G  C2)  is  a  general  update  {catted  existential  update); 

2.  V(Ci  G  C2)  is  a  general  update  (called  universal  update); 

3.  *{C\U  C  2)  is  a  genera/  update  {called  iterated  update); 

4.  'Fi;'F2  is  a  general  update  (called  compound  update)  if  'Fj  and  'F2  are  gen¬ 
eral  updates. 

3.2  Formal  Semantics 

The  core  idea  of  our  approach  can  be  rephrased  as  follows; 

i.  the  insertion  -t-Q  of  a  ground  atom  Q  modifies  the  semantics  of  P  by  adding 
to  its  well-founded  model  WF{P)  the  atom  Q  along  with  its  logical  conse¬ 
quences  (according  to  F ); 

a.  the  deletion  -Q  of  a  ground  atom  Q  discards  from  WF  {P )  the  atom  Q  and 
all  those  atoms  which  have  been  inductively  inferred  from  Q. 


241 


Since  the  semantics  of  a  logic  program  is  completely  determined  by  its 
instantiation,  we  regard  an  update  on  P  as  a  transformation  of  ground{P).  Indeed 
it  is  easily  seen  that  for  each  program  P,  WF(P}=WF (ground (P)).  TTien,  the 
meaning  of  the  program  after  the  execution  of  the  update  will  be  the  well-founded 
model  of  its  transformed  instantiation.  Thus,  the  semantics  of  updates  will  be 
given  by  specifying  a  mapping  <1  which  takes  a  ground  program  ground  (P )  and  a 
general  up^te  4'  as  the  operands  and  returns  the  (ground)  program  resulting  from 
updating  ground  (P)  by  V.  The  transformation  ()>  is  defined  in  terms  of  another 
function  t  which  takes  a  ground  program  ground  (P ),  an  update  operation  U  and  a 
family  L  of  substitutions  as  its  operands  and  returns  a  resulting  (ground)  program 
z(ground(P),  U ,  Z).  From  now  on,  and  throughout  this  section,  we  assume  thatji 
logic  program  P  has  been  fixed  and  we  denote  its  instantiation  ground(P)  by  P . 
The  mapping  x  is  inductively  defined  as  follows. 

if  (/  =  +<2,  then  x(P.  +<2.2)  =  Pu(  a(Q)<-  beZ  }; 

iff/  =-e,thcn  T(P, -e,Z)  =  ^-|  reP  iSaeZs.t.  H(r)=a(Q)Y, 

if  U  =®\Qi  ■  ■  ®kQk  then 

'C(P,®iQi  •  •  ■  0*  Ci  ,2)  =  't(t(f’  ,0iGi.2),02G2  ■  ■  ■0iGi.2)). 

Now,  we  can  define  the  semantics  of  updates.  The  first  update  we  con¬ 
sider  is  the  existential  one. 

Definition.  An  existential,  update  3  (Cj  U  C2)  is  applicable  if  there  exists  a  sub¬ 
stitution  a  such  that:  (i)  P  j-  ct(C  1).  ^d  (ii)  t(P,  U ,  (o))  b  <J(C2)- 
If  3  (Cl  U  C2)  is  applicable  then  <p(P ,  3  (Ci  1/  C2))  =  t(P,  U ,  (a)),  where  jp  is 
any  substitution  verifying  the  conditions  (i)  and  (ii)  above;  otherwise,  (i>(P ,  3 
(Cl  [/  C2))  =  p.n 

Thus,  the  execution  of  an  existential  up^te,  say,  3  (C 1  (/  C  2)  ,  corresponds  to 
the  execution  of  the  ground  up^te  operation  ot/,  where  a  is  a  substitution  such 
that  the  precondition  aC  1  is  verified  in  P,  and  the  postcondition  aC2  is  true  in  the 
program  transformed  by  the  update  operation  aU .  It  is  worth  noting  that  o  is 
non-deterministically  chosen  in  the  set  of  substitutions  satisfying  such  conditions. 
We  notice  that  in  the  case  either  of  the  conditions  is  empty,  the  trivial  condition 
True  is  assumed.  Therefore,  if  C 1  is  empty,  the  update  is  executed  along  with  any 
substitution  satisfying  C2  in  the  resulting  jwo^am.  If  C 2  is  empty,  then  the  update 
is  executed  along  with  any  substitution  satisfying  C 1  in  the  logic  program. 

The  examples  of  this  section  will  refer  to  the  logic  program  Pgrapk .  defining  the 
direct  graph  Gi,  introduced  in  Section  2. 

Example  To  eliminate  one  of  the  arcs  ending  at  a  node  V  such  that  the  arc  (b,  V) 
is  in  the  graph,  we  execute  the  general  update; 

3  (arc  (b ,  y),arc  (X  ,r ),  -arc  (X ,  V)). 

Here,  the  substitutions  verifying  the  requested  precondition  are  ai=lY/a ,  X/c }, 
a2=lYlc ,  X /b },  and  a3=lY/a ,  X /b  p.  Since  no  postcondition  has  to  be  verified, 
then  all  such  substitutions  are  eligible  for  instantiating  the  update  operation. 
Hence,  the  possible  values  of 

^(ground (P.rcpi,),^(arc(b,  Y)^rc(X,Y)  -arc(X,  T)))), 
which  represents  the  result  of  the  execution  of  the  above  update  on  ground  (P graph ) 
are:  _ 

iPi=  larc(a,  b),  arc(b,a),arc(b,  c)]  uTC , 

P2=  larc(a,  b),  arc(b,  a),  arc(c,a)}  '<jTC ,  and. 


^  Note  that,  even  if  a  substitution  is  a  mapping  from  V  to  Bp,  we  have  specified  only  the  assign¬ 
ments  of  the  variables  appearing  in  the  update,  as  the  values  assigned  to  the  other  variables  does  not 
affect  the  semantics  of  the  update. 


242 


P^=  (arc{a,  b),  arc(b,  c)  ^rc(c ,a)]  uTC 
where  TC  is  the  set  of  all  ground  instances  of  the  two  rules  defining  the  predicate 
symbol  tc  (which  remains  unchanged). 

If  one  wishes  to  avoid  deleting  the  arcs  starting  from  the  node  b  he  has  simply  to 
add  the  condition  X*b  in  the  precondition.  In  this  case  the  execution  is  deter¬ 
ministic^  as  the  only  applicable  substitution  is  Oti  so,  the  result  of  the  execution 
yields  Pp  On  the  other  hand,  if  one  wants  that,  after  the  execution  of  the  update, 
some  arc  must  still  reach  the  node  c ,  then  the  general  update  can  be  modified  by 
adding  the  postcondition  arc  (Z  ,c  ); 

3(arc(b,  y),arc(X  ,Y)  -arc(X ,  Y)  arc(Z,c). 

In  this  case  the  substitution  02  is  not  applicable;  thus  the  result  of  the  execution  is 
either  P,  or  P2.  □ 

Now,  let  us  turn  our  attention  to  universal  updates.  Given  an  universal  update 
V(C,  U  C2) 

we  denote  by  the  family  of  all  substitutions  satisfying  Cp  i.e., 

Zc|=  {ct  I  a  /j  a  substitution  and  P 

Definition.  An  universal  update  V(Ci  f/  C2)  is  applicable  if,  for  each  crelc,* 
t(P,  C/,  Zc,)  h  <t(C2). 

IfV(C,  U  C2)  is  applicable  then  i^P ,  V (C ,  U  C2))  =  t(P,  U,  Zc,); 
otherwise,  (KP ,  V  (C 1  (/  C2))  =  P .  □ 

Hence,  in  case  of  an  universal  update,  say,  V(Ci  U  C2),  the  update  operation 
aU  is  executed  for  each  substitution  a  such  that  oC ,  is  verified  in  P .  However, 
if,  for  some  substitution  aeZe^,  CJC2  is  not  uue  in  the  obtained  program,  then  the 
whole  execution  of  the  update  is  aborted  (i.e.,  the  update  produces  a  null  effect). 

Example  Let  consider  the  ’universal  version’  of  the  update  shown  in  the  previous 
example: 

^(arc(b,  YUrc(X,Y)  -arc(X,  Y)), 

which  means:  delete  all  arcs  ending  at  a  node  Y  such  that  the  arc  {b,Y)  is  in  the 
graph.  The  substitutions  verifying  the  applicability  condition  are  the  same  as  in 
the  previous  case:  ai=lY/a,X/c},  a^{Y/c ,  X  /b],  and  Oj^lY/a,  X/b).  Hence, 
i')={CTi.<J2.Cf3}-  Since  a  universal  execution  modality  has  been 
specific,  the  update  operation  -arc{X ,  Y)c  is  executed  on  each 
<y^^<irctb.Y).arcpt.r)-  Thus,  the  result  ^{ground  {Pgraph),(^ 

{arc{b,Y)iarc{X  ,Y) -arcQC  ,Y))))  of  the  update  is  [arc [a  }})]\jTC  which 
agrees  with  the  given  intuition.  As  we  already  pointed  out  in  Section  2,  in  a 

universal  update  the  postcondition  plays  the  role  of  a  constraint  which,  if  not  j 

satisfied,  invalidates  the  whole  update  execution.  For  instance,  let  us  add  to  the  | 

previous  update  the  postcondition  tc(a,  a)  by  which  we  are  requesting  that,  after  ! 

the  execution  of  the  update,  the  node  a  is  in  a  cycle: 

V  (arc (b ,  Y),arc  (X ,y )  -arc (X,Y)  tc  (a  ,a ). 

Now,  the  set  ^arc(b.Y)^cQ[,r)  is  again  (01,02,03).  Thus, 

T(groM/id (Pgrop/, ),{ 01,02,03 1, -arc ()(,y))  is  P'=lar_c(a  d?)]'uTC  as  before.  How¬ 
ever,  the  postcondition  tc(a,a)  is  not  satisfied  in  P '.  Hence,  the  update  is  aborted 
and  ^(ground  (P graph  ).(^  (ore  (b  ,Y),arc  (X  ,Y )  -arc  (X ,  Y  )tc  (a  ,a )))) 

=  ground  (P graph ).  i  e.,  the  program  remains  unchanged.  Finally  it  is  worth  noting 
that,  in  a  universal  update,  the  range  of  the  variables  is  determined  by  the  precon¬ 
dition;  so,  a  variable  appearing  only  in  the  postcondition  is  universally  quantified. 

For  instance  if  we  change  tc(a,a)  into  tc(a,Z)  in  the  previous  update,  then  the 
postcondition  express  the  requirement  that  tc(a,Z)  is  true  for  each  Z,  with  Z  rang¬ 
ing  in  the  Herbrand  Universe.  On  the  contrary,  since  the  variable  X  appears  also 


243 


in  the  precondition,  a  postcondition  tc(aj()  would  require  only  the  truth  of 
tc  (a  Jb )  and  tc  (a  ,c ),  as  h  and  c  are  the  values  for  X  in  Oj ,  02  and  □ 

Finally,  let  us  consider  the  iterated  update. 

Definition.  An  iterated  update  *(Ci  U  C2)  is  opglicable  if  there  exists  a  substitu¬ 
tion  a  such  that:  (i)  P\-a{Ci),  (ii)  tf/’,  f/,  la))  h  0(^2).  and,  (iii) 
(o))  ^  F.  Jf  *{CiU  C2)  is  applicable,  then 
U  C2))  =  U,  (a}),*(Ci  U  C2)),  where  a  is  any  substitution_ ver¬ 

ifying  the  conditions  (i),  (ii),  and  (iii)  above;  otherwise,  ^P ,  *{C\IJ  C2))  =  F.  □ 
Thus,  executing  an  iterated  update  *(Ci  U  C-^  corresponds  to  repeatedly 
applying  the  existential  update  3  (C ,  (/  C2)  until  it  becomes  not  applicable.  The 
further  condition  (iii)  on  applicability  discards  those  substitutions  which,  though 
making  the  pre-  and  post-  conditions  true,  do  not  actually  cause  the  program  to  be 
modified. 

Example.  Let  consider  the  iterated  update,  discussed  in  Section  2,  which  orients 
the  graph  of  program  Paraph  ■ 

*  (arc (X ,  K),  arc (Y ,X)  -arc (X ,  Y)) 

It  is  easy  to  recognize  that  the  only  applicable  substitutions  are  ai={X/a  ,Y/b) 
and  G2=lY/a  lb].  Hence,  the  effect  of  the  first  iteration  is  to  delete  either  the 
arc  (a  ,b )  or  the  arc  (b  ,a )  from  the  graph.  Then,  in  either  case  no  substitution  is 
applicable  at  the  second  round  (i.e.,  on  the  modified  program),  so  the  execution 
terminates.  Formally,  let  Pi=ground(Pgraph)~{orc(a,b)]  and 
P2=ground(P„aph)-{<^c(b,a)],  then  there  are  two  possible  values  for 
^ground (P^rapH),*  (arc (X,  Y),  arc(Y,  X)  -arc(X,  Y))): 

either  the  chosen  substitution  is  ai,  then 

^ground (P.rapH I*  (arc  (X ,  Y),  arc  (Y,X)  -arc  (X ,  F )))  =  _ 
ib(P  1  .♦  (arc  (X ,  F),  arc  (F ,  X )  -arc  (X ,  F )))=/> , ; 

or  the  chosen  substitution  is  02,  then 

(^(ground (Paraph L* (arc (X ,  Y),  arc(Y ,  X)  -arc (X ,  F )))  =  _ 

iSf(P  2,*  (arc  (X ,  F  ),  arc  (F ,  X )  -arc  (X ,  F  )))=P  2  □ 

As  already  mentioned,  in  general,  we  are  not  guaranteed  that  the  execution 
of  an  iterated  u^ate  terminates  in  any  case. 

Example.  The  update  *  (arc  (X  ,F ),  -arc  (X  (Y ))  +arc  (X ,  /  (F ))  goes  on  adding 

new  nodes  and  arcs  to  the  example  graph  never  terminating.  □ 

For  ground  updates,  we  have  the  following: 

Proposition.  Let  P  be  a  logic  program,  U  be  a  ground  update  operation,  and,  G 1 
and  G  2  two  ground  goals.  Then 

^(ground (P ),  3 (G i f/C 2))  =  ^(ground (P ),  V (G i UG 2))  =  Aground (P),  *(GiUG2))0 

The  above  result  justifies  dropping  the  modality  specifier  for  variable  free  general 
updates,  as  we  have  often  done  in  the  previous  sections. 

We  have  described  how  the  execution  of  one  single  general  update  affects  a 
ground  program.  It  is  easily  recognized  that  the  approach  immediately  extend.s  to 
sequences  of  general  updates.  Indeed,  the  0  function  yields  programs,  and  therefore 
it  can  be  reapplied  on  its  results.  As  a  consequence,  the  semantics  of  a  sequence  of 
n  general  updates  is  given  by  applying  the  0  function  n  times.  More  formally,  a 
sequence  of  general  updates  to  be  applied  on  a  program  P  is  seen  as  a  unique 
compound  update  whose  semantics  is  defined  next. 

Definition.  Given  a  logic  program  P ,  let  'Pj;  •  •  ■  ;'P„  be  a  compound  general 
update.  Then 

(^(ground (P),  'P,;  ■  •  •  =  0(0(ground (P )),  'PO),  'Pzi  '  '  '  □ 

Obviously,  if  one  of  the  general  updates  composing  the  sequence  is  an  iterated 


244 


update  and  it  goes  in  a  never  ending  loop,  also  the  entire  sequence  loops 
indefinitely. 

We  are  finally  in  the  position  of  giving  the  definition  of  the  semantics  of  an 
updated  program  against  which  to  perform  query  answering. 

Definition.  Let  P  be  a  logic  program  and  let  'f'  =  4',,  .  .  .  .'P,  be  a  list  of  general 
updates  to  be  executed  on  P .  The  semantics  ,  4<  of  the  program  obtained  updat¬ 
ing  P  by  S'  is  defined  as  the  well  founded  model  of  ^(ground  (P ),  S'))  if 
ground  (P),  SO  is  defined,  and  it  is  undefined,  otherwise. 

References 


1.  Abiteboul,  S.,  Updates,  a  new  frontier,  in:  Proc.  Second  Int.  Conf.  on  Data¬ 
base  Theory,  LNCS  326,  Springer-Verlag,  1-18,  1988. 

2.  Abiteboul,  S.  and  V.  Vianu,  Datalog  extensions  for  database  queries  and 
updates,  JCSS,  43(1),  62-124,  1991. 

3.  Atzeni,  P.  and  R.  Torlone,  Updating  intensional  predicates  in  datalog,  Data 
and  Knowledge  Engineering,  1992, 

4.  Bancilhon,  F.,  and  N.Spyratos,  "Update  semantics  of  relational  views",  ACM 
TODS,  6(40),  Dec.  1981, 

5.  Decker,  H.,  Drawing  updates  from  derivations,  in:  Proc.  3th  Int.  Conf.  on 
Database  Theory,  LNCS  460,  Paris,  1990. 

6.  Grahne,  G.,  A.O.  Mendelzon  and  P.Z.  Revesz,  Knowledgebase  transforma¬ 
tions,  in:  Proc  1 1th  ACM  PODS,  246-260,  1992. 

7.  Guessoum  A.  and  J.W.  Lloyd,  Updating  knowledge  bases.  New  Generation 
CompuUng,  8(1),  71-89,  1990. 

8.  Harel,  D.,  "First-order  dynamic  logic",  LNCS  (Goos,  G.,  and  J.Hanmains 
eds.),  Springer-Verlag,  1979. 

9.  Harel,  D.,  "Dynamic  logic",  in  Handbook  of  Philosophical  Logic,  (Gabbay 
and  Guenther,  eds.),  D.Reidel  Publishers,  1983. 

10.  Kakas,  T.  and  P.  Mancarella,  Database  updates  through  abduction,  in:  Proc. 
16th  Int.  Conf.  on  Very  Large  Databases  (Morgan-Kaufmann,  Los  Altos,  CA, 
1990)  650-661. 

11.  Lloyd,  J.W.,  Foundations  of  logic  programming  (Springer,  Berlin,  2nd  ed., 
1987). 

12.  Manchanda,  S.  and  D.S.  Warren,  "Towards  a  logical  theory  of  database  view 
updates",  Int.Worksh.  on  Foundations  of  Deductive  databases  and  Logic  Pro¬ 
gramming,  J.Minker  ed.,  Aug.  1988. 

13.  Naqvi,S.,  and  R.Krishnamurthy,  "Database  updates  in  logic  programming, 
ACM  PODS,  1988. 

14.  RossiJ^.,  and  S.Naqvi,  "Contribution  to  the  view  update  problem",  Proc.  Int. 
Conf.  on  Logic  Programming,  388-415,  1989. 

15.  Sterling,  L.  and  E.  Shapiro,  The  art  of  Prolog,  MIT  Press,  Cambridge,  1986. 

16.  Van  Gelder,  Ross,  Schlipf,  The  well-founded  semantics  of  general  logic  pro¬ 
grams,  Journal  of  ACM,  38(3),  620-650,  1991. 

17.  Warren,  D.S.,  Database  update  in  pure  Prolog,  in:  Proc,  Int.  Conf.  on  Fifth 
Generation  Computer  Systems,  244-253,  1985. 

18.  M.  Winslett,  "A  model-theoretic  approach  to  updating  logical  databases", 
ACM  PODS,  1986. 


Expressing  Program  Requirements  using  Refinement  Lattices 

Dave  Robertson  ti  Jaume  Agusti  J,  Jane  Hesketh  f,  Jordi  Levy  J 
tDepartment  of  Artificial  Intelligence,  University  of  Edinburgh. 
tlllA,  Centre  d’Estudis  Avan^ats  de  Blanes,  Blanes,  Spain. 

Abstract 

Requirements  capture  is  a  term  used  in  software  engineering,  referring 
to  the  process  of  obtaining  a  problem  description  -  a  high  level  account 
of  the  problem  which  a  user  wants  to  solve.  This  description  is  then  used 
to  control  the  generation  of  a  program  appropriate  to  the  solution  of  this 
problem.  Reliable  requirements  capture  is  seen  as  a  key  component  of  fu¬ 
ture  automated  program  construction  systems,  since  even  small  amounts 
of  information  about  the  type  of  problem  being  tackled  can  often  vastly  re¬ 
duce  the  space  of  appropriate  application  programs.  Many  special  purpose 
requirements  capture  systems  exist  but  few  of  these  are  logic  based  and  all 
of  them  operate  in  tightly  constrained  domains.  This  paper  introduces  a 
formal  language  for  requirements  capture  which  bridges  the  gap  between 
an  order  sorted  logic  of  problem  description  and  the  Prolog  programming 
language.  An  extended  version  of  this  paper  appears  in  [4]. 

1  Introduction 

Previous  work  on  requirements  capture,  described  in  [5],  attempted  to  control  the 
generation  of  Prolog  programs  by  applying  domain  knowledge  from  a  problem 
description  supplied  by  the  user.  This  approsich  is  attractive  because  it  buffers 
users  from  part  of  the  programming  task.  However,  there  is  a  tension  between 
the  demands  of  users  for  a  notation  to  which  they  can  relate  and  the  need  for 
computational  sophistication  in  their  application  programs.  This  tends  to  create 
a  conceptual  gap  between  the  languages  of  problem  description  and  application. 
The  trade-offs  which  were  made  in  attempting  to  bridge  this  gap  are  discussed  in 
[6]  but  the  end-result  is  normally  that  the  language  used  for  problem  description 
is  different  from  the  language  used  to  describe  the  application  program.  This 
can  become  a  serious  problem  if  the  means  by  which  the  two  languages  interact 
during  program  generation  is  not  weli  understood. 

One  way  to  tackle  these  problems  is  to  devise  a  language  which  can  be  used 
for  problem  description  but  also  has  a  straightforward  translation  to  an  applica¬ 
tion  programming  language.  This  language  has  to  be  expressive  but  it  must  also 
be  easy  to  use.  In  addition,  it  should  be  capable  of  describing  a  programming 
problem  in  general  terms  or  in  greater  detail,  depending  on  users’  preferences. 
Previous  work  by  Bundy  and  Uschold  ([!])  has  attempted  to  provide  this  sort 
of  uniform  language  based  on  typed  lambda  calculus  but  they  have  yet  to  im¬ 
plement  these  ideas  in  a  working  system  and  the  complexity  of  the  mathematics 
involved  makes  it  difficult  to  see  how  users  without  specialist  training  could  feel 
confident  about  using  it.  A  solution  to  this  problem  would  be  to  “dress  up” 


246 


the  mathematics  in  a  form  which  is  more  easily  understood.  Unfortunately,  it 
is  often  difficult  to  make  an  inherently  complex  notation  appear  simple.  An 
alternative,  which  we  adopt  in  this  paper,  is  to  start  with  comparatively  simple 
underlying  principles  and  to  manipulate  these  to  obtain  complex  programs.  A 
good  source  of  ideas  for  this  approach  is  in  logic  programming,  which  (in  the 
form  of  pure  Prolog  programs)  embodies  a  simple  but  powerful  programming 
paradigm.  A  second  source  of  inspiration  is  to  be  found  in  recent  set-based 
specification  languages.  In  particular  we  have  drawn  upon  ideas  from  the  COR 
system  of  refinements  ([2]). 

The  core  of  the  requirements  capture  language  depends  on  representing  a 
lattice  of  sets  of  results  of  predicates.  This  constitutes  our  problem  description 
language.  Section  2  introduces  this  notion  in  the  context  of  Prolog' .  This  is 
followed,  in  Section  3  by  a  description  of  the  way  in  which  expressions  in  the 
language  may  be  translated  into  Prolog.  Since  this  is  intended  to  be  a  high 
level  language,  not  all  of  the  axioms  translate  directly  into  Prolog  and  some  are 
used,  with  the  aid  of  proof  rules,  to  control  problem  description.  In  Section  4 
we  describe  some  of  the  proof  rules  which  we  use  later,  in  Section  5,  to  provide 
guidance  in  defining  set  lattices.  Finally,  in  Section  6,  we  describe  how  programs 
(at  differing  levels  of  detail)  may  be  extracted  from  our  lattices. 

2  Denoting  Argument  Sets 

It  is  conventional  to  define  the  meaning  of  a  logic  program  to  be  the  set  of 
ground  unit  goals  deducible  from  that  program.  This  gives  a  form  of  “global” 
meaning  for  a  predicate  in  terms  of  all  its  arguments  but  it  is  possible  to  define 
more  local  interpretations  in  terms  of  stipulated  arguments.  We  shall  use  the 
notation  V  P  io  denote  the  set  of  instances  for  the  variable  V  which  can  be 
obtained  from  the  goal  P.  In  order  sorted  logics,  it  is  normal  to  restrict  the 
range  of  objects  over  which  variables  in  formulae  are  permitted  to  range.  We 
can  achieve  this  effect  using  our  notation  by  permitting  the  variables  inside  the 
gO£il  expression  to  be  restricted  using  the  operator.  This  permits  any  predicate 
to  be  applied  over  sets  of  objects,  rather  than  over  individuals  as  would  be  the 
case  in  standard  first-order  predicate  calculus.  The  interpretation  of  a  predicate 
argument  applied  in  this  way  is  defined  as  the  set  of  results  for  the  variables 
on  the  left  of  the  operator,  given  the  application  of  the  predicate  to  every 
combination  of  elements  in  the  sets  denoted  in  its  arguments.  Since  all  of  our 
terms  represent  sets  of  objects  we  can  introduce  some  standard  set  operators  as 
follows: 

Definition  1  If  A  and  B  are  set  expressions  then  Ar\B  is  the  intersection  of 
A  and  B;  AuB  is  the  union  of  A  and  B;  and  AD  B  denotes  that  B  ts  a  subset 
of  A. 

•Throughout  this  paper  we  shall  be  using  “pure”  Prolog,  without  complicating  features 
such  as  cut  or  side-eiTecting  predicates 


247 


The  use  of  the  D  operator  allows  us  to  arramge  our  set  expressions  into  a 
lattice.  To  provide  a  “top”  suid  “bottom”  to  this  lattice  we  shall  use  the  symbol 
T  to  denote  the  entire  universe  of  discourse  and  ±  to  denote  the  empty  set  of 
objects.  The  full  syntax  of  refinement  expressions  appears  below: 

Definition  2  A  refinement  formula  ts  of  the  form  H  D  B,  where: 

•  H  ts  the  head  of  the  refinement  and  is  a  primitive  set  expression. 

•  B  is  the  body  of  the  refinement  and  con  be  any  set  expression. 

•  A  primitive  set  expression  is  of  the  form  V  :  P,  where  T  js  a  variable 
appearing  in  P  and  P  is  one  of  the  following: 

~  A  Prolog  goal. 

-  A  term  of  the  form  Q(Ai,  ,  An),  where  Q  ts  a  predicate  name  and 
each  Aj  is  either  a  variable,  constant  or  set  expression. 

•  A  set  expression  is  of  the  form  V  :  E,  where  V  is  a  variable  appearing  in 
E  and  E  is  one  of  the  following: 

~  A  primitive  set  expression. 

-  A  union  of  set  expressions,  Vi  :  Ei  U  V2  :  E2 

-  An  intersection  of  set  expressions,  Vj  ;  JFi  n  14  ;  E2 

-  The  difference  between  two  set  expressions,  Vi  :  Ei  —  14  E2 

V  ts  said  to  be  restricted  by  the  expression  E.  Any  variable  which  is  not 
restricted  in  this  way  is  said  to  be  unrestricted. 

•  Any  unrestricted  variable  in  the  head  of  a  refinement  formula  must  appear 
in  the  body  of  that  formula. 

The  next  section  will  make  more  clear  why  the  restrictions  on  syntax  supplied 
in  definition  2  are  needed.  It  is  worth  noting  in  passing  that  set  expressions  for 
first  order  predicate  calculus  have  also  been  introduced  in  [3]  but  in  a  different 
form  and  for  different  purposes. 

3  Mapping  Prolog  to  the  Refinement  Language 

Section  2  introduced  the  basic  notation  for  the  refinement  language.  The  pur¬ 
pose  of  this  section  is  to  show  how  the  language  can  be  understood  in  terms  of 
Prolog.  To  simplify  our  explanation,  we  shall  demonstrate  the  correspondence 
for  unary  predicates  but  the  same  principles  apply  to  predicates  of  any  arity. 
The  D,  n,  and  U  operators  can  be  interpreted  straightforwardly  in  terms  of  the 
logical  connectives  for  implication  (<— ),  conjunction  (&)  and  disjunction  (V). 
The  correspondences  are  as  shown  below: 


Any  nested  variable  restrictions  (using  the  operator)  within  terms  must 
be  converted  into  preconditions  for  logiced  rules.  Thus,  if  we  have  an  expression 
of  the  form: 

Vi  :P{Vi)  D  V2:Q(14:  A(14),V2)  (1) 

we  would  rewrite  it  to  the  expression: 

P(V)  -  A{V:,)kQ{V:,,V)  (2) 

It  is  important  to  remember  that  not  all  the  refinement  formulae  are  intended 
to  translate  directly  into  Prolog.  In  general,  the  refinement  relation  is  more  “per¬ 
missive”  than  standard  implication  and  with  it  we  can  represent  a  wide  variety 
of  information,  only  part  of  which  is  sufficiently  precise  to  constitute  a  Prolog 
program.  In  particular,  it  is  not  always  possible  to  translate  from  refinements 
which  have  restricted  variables  in  the  head  but  these  variables  do  not  appear  in 
the  body.  Since  our  refinement  language  is,  in  this  sense,  very  flexible  we  must 
be  careful  which  axioms  are  allowed  to  be  translated  into  Prolog.  However,  pro¬ 
vided  such  checks  are  in  place,  we  can  benefit  from  the  extra  flexibility  during 
problem  description.  For  this,  we  need  to  use  some  standard  proof  rules,  which 
are  the  topic  of  the  next  section. 

4  Refinement  Proof  Rules 

Since  all  the  expressions  in  the  language  refer  to  sets,  we  can  use  proof  rules  from 
set  theory  to  perform  many  of  the  operations  necessary  during  requirements 
capture.  This  section  describes  some  of  the  proof  rules  which  we  currently  use 
and  we  anticipate  that  further,  derived  rules  will  be  added  to  the  collection  as  the 
system  matures  -  for  instance,  rules  describing  the  preservation  of  unions  and 
intersectioi  s  of  predicates  and  a  full  set  of  rules  for  the  set  difference  operator. 
In  subsequent  sections  we  shall  show  some  of  these  rules  in  operation.  In  the 
proof  rules  which  follow,  the  symbols  A.  B  and  C  denote  set  expressions;  T 
denotes  the  universal  set;  and  ±  denotes  the  empty  set. 

Proof  rule  1  t-  T  D  A 

Proof  rule  2  A  D  ± 

Proof  rule  3  h  A  D  A 

Proof  rule  4  ADB,BDC^ADC 

Proof  rule  5  C  D  A,C  D  B  C  D  AU  B 


249 


Proof  rule  6  AU  B  J  A  and  AiJ  B  D  B 
Proof  rule  7  h  A  D  Ar\  B  and  h  B  D  AC\  B 
Proof  rule  8  B  D  A,C  ^  A\-  B  DC  D  A 
Proof  rule  9  I-  (j4  n  fl)  U  D  C)  3  j4  n  (B  U  C) 

Proof  rule  IQ  A  2  A' V  :  P(A)  3  V  :  P{A') 

5  Defining  a  Refinement  Lattice 

It  would  be  possible  to  define  complete  programs  entirely  within  the  refinement 
language.  However,  this  doesn’t  seem  to  us  to  be  the  most  advantageous  use  of 
the  language,  since  it  merely  replicates  a  standard  logic  program.  In  defining 
refinement  lattices  a  key  idea  is  that  people  should  be  allowed  to  “rise  above” 
the  level  of  the  application  program  in  the  initial  stages  of  refinement.  When 
performing  this  task,  it  is  common  to  want  to  combine  existing  ref  .ments  in 
order  to  be  more  specific  about  the  way  in  which  they  apply.  To  support  this 
process  we  permit  users  to  restrict  the  size  of  a  refinement  expression  on  either  (or 
both)  the  left  or  right  sides.  Since  this  could  result  in  an  overdefined  expression 
-  for  example  by  over-restricting  the  left-hand  side  of  the  refinement  -  we  must 
also  apply  a  test  for  overdefinition  to  the  resulting  expression  (described  in  [4]). 

Definition  3  A  refinement  of  the  form  AD  B  is  an  extension  of  the  refinement 
lattice  7i  if  A'  D  B'  G  Ti  and  i4'  3  i4  and  B'  3  B. 

For  example,  we  might  have  added  the  information  that  locations  of  fish 
include  aquatic  habitats;  that  aquatic  habitats  include  rivers  and  that  Carp  are 
fish; 


H  :  aquatic Jiabitat{H) 

3 

L  ;  location(F  :  fish(F),L) 

(3) 

H  ;  aquatic Jiabitat(H) 

3 

R  :  riv€r{R) 

(4) 

F  ;  fish(F) 

3 

C  :  carp{C) 

(5) 

Now  if  we  add  the  information 

that 

the  locations  of  Carp  include  rivers. 

R  :  river{R)  3 

L  :  location(C  :  carp{C),L) 

(6) 

we  can  show  that  this  is  a  vaJid  extension  as  follows; 


•  By  definition  3  using  axiom  3,  we  have  aui  extension  if; 

L  ;  location{F  :  fish(F),  L)  3  L  :  hcation(C  ;  carp(C),  L)  and 
H  ;  aquatic Jiabitai(H)  3  R  :  river(R) 

•  By  proof  rule  8  we  can  establish  that; 

L  :  location(F  :  fish{F),  L)  D  L  :  location{C  ;  carp(C),  L)  if 
F  ;  fish{F)  3  C  ;  carp(C). 


250 


•  F  :  Jish(F)  D  C  :  carp(C)  by  proof  rule  4  using  axiom  5. 

•  //  :  aquatic Jiabilat(H)  2  F  ■  river{R)  by  proof  rule  4  using  axiom  1. 

We  would  like,  as  far  as  possible,  to  protect  users  against  including  refine¬ 
ments  will,  h  are  overdefined  within  the  existing  lattice.  We  use  the  symbol,  ±,  as 
the  empty  set  expression  and  assume  that  all  new  sets  added  will  be  (potentially) 
larger  than  i..  Therefore: 

Defiuition  4  A  refinement,  A  D  B  ts  overdefined  tf.  in  conjunction  with  the 
other  axioms  of  the  existing  refinement  lattice,  A  D  B  y-  L  D  B . 

One  of  the  main  purposes  of  this  definition  is  in  limiting  the  ways  in  which 
set  expressions  can  be  refined,  thus  reducing  the  range  of  choices  available  to 
users  when  constructing  the  lattice.  For  example,  it  is  sometimes  useful  to  be 
able  to  define  a  predicate  which  can  range  over  only  particular  sets  of  arguments 
but  not  others.  For  example,  we  might  want  to  say  that  spiders  only  eat  living 
things.  We  can  express  this  using  the  axiom: 

X  :lii  'ng(X)  2  Xi  :  eats(S  i>pider{S),  XI)  (7) 

If  we  also  add  the  constraint  that  no  uing  is  both  living  and  dead; 

1  2  Xl-.living(Xl)r\X2:dead{X2)  (8) 

then  we  can  protect  against  generalisatio'’  o."  the  eats/2  predicate.  For  ex¬ 
ample,  if  we  try  to  add  the  axiom: 

X  :  eats{S  ■  spider{S),X)  D  Xl  :dead{X\)  (9) 

we  can  prove  that  this  is  overdefined  as  follows: 

•  By  proof  rule  4,  ±  2  ^  ■  dead{X)  if: 

X  2  ■  living{Xl)  (1X2  :  dead{X2)  (Axiom  8)  and 

X\  :  living{Xl)  n  X2  :  dead{X2)  D  X  :  dead{X) 

•  By  proof  rule  8,  XI  :  /zt;in(/(Xl)  n  X2  :  dead{X2)  D  X  :  dead[X)  if; 

XI  ;  living(Xl)  2  A'  :  d€ad{X)  and 
X2  :  dead{X2)  2  A  :  dead(X) 

•  By  proof  rule  4,  XI  .  living{Xl)  2  X  :  dead{X)  if: 

XI  ;  living{Xl)  2  A2  :  eats{S  :  spid€r{S),X2)  (Axiom  7)  and 
X2  :  eats(S  :  spider(S),  X2)  D  X  :  dead(X)  (Axiom  9) 

•  By  proof  rule  3,  X2  :  dead(X2)  D  X  :  dead(X) 


5.1  A  Simple  Example 

Having  defined  mechanisms  for  creating  and  extending  refinements  we  introduce, 
in  this  section,  a  short  exeimple  to  demonstrate  the  way  in  which  the  language 
may  be  used  to  develop  incrementally  a  requirements  specification.  VVe  shall 
use  a  (somewhat  contrived)  biological  examtiple,  in  which  we  wish  to  represent 
populations  of  wolves  and  deer  which  have  different  probabilities  of  survival 
depending  on  their  location.  A  larger  example  can  be  found  in  [4].  To  begin,  we 
can  introduce  the  concept  of  probabilities  using  the  refinement: 

T  D  P  :  probability(P)  (10) 

We  could  then  go  on  to  provide  more  specific  information  pertaining  to  prob¬ 
abilities.  In  particular,  we  could  say  that  a  more  restricted  type  of  probability 
is  the  survival  factor  of  animals: 

P  :  probability(P)  D  S  :  survival{A  :  animal{A),  S)  (11) 

At  his  point,  we  have  introduced,  as  part  of  expression  11a  requirement  for 
animal/ 1  to  be  placed  in  the  lattice.  This  is  flagged  as  one  of  the  gaps  in  the 
requirement  specification  and  we  plug  this  gap  by  adding  animal /I  below  T. 
At  the  same  time,  it  is  convenient  to  add  wolf /I  and  deer/1,  as  refinements  of 


animal/l,  and  red.deer/1  as  a  refinement  of  deer/\: 

T  D  A  .  animal(A)  (12) 

A  :  animal  (A)  D  W  :wolf{W)  (13) 

A  :  animal{A)  D  D  ■  deer{D)  (14) 

D  :  deer(D)  D  R  :  redjdeer{R)  (15) 

We  might  then  decide  to  introduce  a  refinement  of  survival  which  is  depen¬ 
dent  on  the  the  location  of  the  animals: 

S  :  survival{A  :  animal(A),S)  D  (16) 
F  :  fl{L  :  location{A  ;  animal{A),  L),  B  :  animal(B),  F) 


This  again  introduces  a  gap  in  the  specification,  for  location/^,  which  we 
first  introduce  below  T  and  then  define  using  two  axioms. 

T  D  L  :  location{A  ■  animal{A),  L)  (17) 

L  :  location{A  :  animal{A),  L)  D  H  :  hill{H)  (18) 

L  :  location{A  :  animal{A),  L)  D  P  :pasture{P)  (19) 

We  might  now  decide  to  be  more  specific  about  the  types  of  results  which 
we  would  expect  to  obtain  from  /f/3.  For  example,  we  could  stipulate  that  the 
results  in  the  third  argument  for  deer  on  hills  might  be  the  integers  between 
50  and  100,  while  the  same  argument  for  wolves  on  hills  might  be  the  integers 
between  40  and  60. 


252 


F  :  //(I  ;  hill(L),  A  :  deer(A),  F)  D  N  :  between(50, 100,  N)  (20) 
F  .  fl{L  :  hill{L),A  :wolf(A),F)  D  N  :  between(A0,&0,  N)  (21) 

Finally,  we  could  be  more  specific  about  the  locations  of  particular  groups  of 
animals.  For  example,  we  could  give  possible  locations  for  redjleer  to  be  hills. 

L  :  location(A  :  red-deer(A),  L)  D  H  :  hUl(H)  (22) 


6  Extracting  a  Program 

In  Section  5  we  demonstrated  how  a  lattice  of  refinements  could  be  constructed. 
This  lattice  is  capable  of  describing  a  large  number  of  different  programs,  which 
vary  on  two  dimensions; 

•  The  level  of  detail  at  which  a  program  in  the  lattice  is  described  will 
vary  depending  on  the  depth  to  which  we  descend  through  the  chains  of 
refinement.  The  further  we  travel  towards  the  bottom  of  the  lattice  the 
more  detailed  our  programs  become. 

•  There  may  be  more  than  one  possible  refinement  of  a  set  expression  at  any 
given  point  in  the  lattice.  These  produce  choice  points  in  the  extraction 
of  program  det^ls. 

Bearing  the  above  considerations  in  mind,  the  method  used  to  extract  a  pro¬ 
gram  from  the  refinement  lattice  is  based  on  a  simple  principle.  Recall  the  map¬ 
ping  between  refinements  and  implication  which  has  been  shown  in  Section  3. 
Using  this  mapping,  if  we  take  any  sequence  of  refinements  down  through  the 
lattice  from  some  top  level  set  expression  then  by  translating  the  refinements  of 
that  sequence  into  axioms  of  Prolog  we  shall  have  produced  a  partial  program 
the  results  of  which  are  included  in  the  top-level  set  expression.  Some  addi¬ 
tional  complexity  is  introduced  into  the  algorithm  because  we  permit  nesting 
of  set  expressions.  This  means  that  when  we  are  finding  sequences  of  refine¬ 
ments  we  need  to  do  more  than  simply  match  the  left  and  right  sides  of  the 
appropriate  refinements  -  we  also  need  to  ensure  that  set  expressions  contained 
in  the  matching  expressions  can  be  coerced  toward  a  non-empty  intersection. 
The  full  refinement  algorithm  is  given  below.  Note  the  recursive  use  of  the  al¬ 
gorithm  when  unifying  set  expressions  and  also  the  need  to  propagate  the  set 
intersections  from  unification  through  the  right  hand  side  of  the  smaller  of  the 
refinement  expressions. 

Algorithm  1  We  write  refinement{Ti,T2,P)  to  denote  that  T2  is  a  valid  re¬ 
finement  ofTi  ^producing  axiom  iet  P,  given  refinement  lattice  "H.  The  algorithm 
for  this  is  as  follows: 

•  refinem€nt{Ti,T2,P)  ifrefinement{Ti,T2,{),P) 


253 


•  refinement(T,T,P,P) 

•  re f  inemtnt(y  :  A,  V  :  B,P,P")  if 

-  {V  :A'  DV  :  B')  €  Ti  and 

-  unify{A,A',Au,P,P*)  and 

-  propagate  J>indings{Au,  B' ,  Bu)  and 

-  refinement{V  ■  Bu,V  :  B,P',P") 

•  refinement{P(Ai,  --,An),PiA[,  -,A'„),P,P')  tf 

-  For  each  Ai  and  A'j:  refinement{Ai,A'j,Pi)  and  P'  =  \J  Pj  U  P 

•  refinement{V  -.A^V  -.BiO  B2,  P,  P")  if 

-  refinemeTit(V  :  A,V  :  Bi,  P,  P')  and  refinement{V  :  A,V  :  B2,  P' ,  P") 

Algorithm  2  We  write  umfy(A,A\Au,P)  to  denote  that  set  expressions  A 
and  A'  have  a  shared  subset  defined  by  set  expression  Au,  yielding  axiom  set,  P. 

The  algorithm  for  this  is  as  follows: 

•  unify{A,A',Au,P)  if  umfy{A,A' ,Au,{}>  P) 

•  unify{A,A',Au,P,P")  if 

-  refinement{A,Au,  P,  P’)  and  refinement^A' ,A^,P' ,P") 

Algorithm  3  The  procedure,  propagate J>indings{A,  B ,  B')  takes  each  term  of 
the  form  V  :  X  contained  in  A  and  replaces  every  occurrence  of  V  :  -  in  B  with 
V  :  X,  yielding  the  new  term,  B' . 

Using  this  refinement  algorithm,  in  combination  with  axioms  10  to  22  from 
Section  5.1,  we  can  extract  a  variety  of  sets  of  refinements  from  the  refinement 
lattice,  including  the  one  below; 

D  ;  deer{D)  D  D  :  redjdeer{D) 

D  :  animal{D)  D  D  :  deer{D) 

L  :  location(D  :  redjdeer{D),L)  D  L  :  hill{L) 

S  :  fl{L  :  hill(L),  D  ;  redjdeer{D),S)  D  S  :  between(bO,  100,5) 

5  :  survival(D  :  red.deer{D),S)  D  S  :  fl{L  :  location{D  :  red.deer{D),  L), 

D  ;  redjdeer{D),  S) 

Applying  the  translation  algorithm  to  these  refinements  gives  the  partial 
program: 

deer{D)  ♦—  red.deer{D) 
animal{D)  <—  deer{D) 
location{D ,  L)  «—  redAeer{D)  h  hiU{L) 

fl{L,D,S)  *—  hill(L)  ic  red.deer{D)  k  between(b0,l00,S) 
survival{D,  S)  *—  red.deer{D)  k  location{D,  L)  k  fl{L,  D,  S) 


7  Conclusions 


The  language  introduced  in  this  paper  embodies  what  we  claim  to  be  a  novel 
approach  to  requirements  capture.  It  has  the  following  features: 

•  The  space  of  requirements  is  described  using  a  lattice  of  refinements  be¬ 
tween  sets  of  potential  results  from  Prolog  programs. 

•  Construction  of  a  Prolog  program  can  be  achieved  by  searching  this  re¬ 
quirement  space,  having  delimited  the  upper  and  lower  bounds  within 
which  the  completed  (partial)  program  must  lie. 

•  Guidance  during  the  construction  of  the  refinement  lattice  is  obtained  by 
the  application  of  logically  consistent  set-theoretic  proof  rules. 


References 

[1]  A.  Bundy  and  M..  Uschold.  The  use  of  typed  lambda  calculus  for  require¬ 
ments  capture  in  the  domain  of  ecological  modelUng.  Research  Paper  446, 
Dept,  of  Artificial  Intelligence,  Edinburgh,  1989. 

[2]  J.  Levy,  J.  Agusti,  F.  Esteva,  and  P.  Garcia.  An  ideal  model  of  an  extended 
lambda-calculus  with  refinement.  Ecs-lfcs-9l-188,  Laboratory  for  the  Foun¬ 
dations  of  Computer  Science,  1991. 

[3]  D.  McAllester,  B.  Givan,  and  T.  Fatima.  Taxonomic  syntax  for  first  order 
inference.  In  Proceedings  of  KR-89,  1989. 

[4]  D.  Robertson,  J.  Agusti,  J.  Hesketh,  and  J.  Levy.  Expressing  program  re¬ 
quirements  using  refinement  lattices.  Research  paper.  Department  of  Arti¬ 
ficial  Intelligence,  University  of  Edinburgh,  1992.  Longer  version  of  paper 
submitted  to  ISMIS-93. 

[5]  D.  Robertson,  A.  Bundy,  R.  Muetzelfeldt,  M.  Haggith,  and  M  Uschold.  Eco- 
Logic:  Logic~Based  Approaches  to  Ecological  Modelling.  MIT  Press  (Logic 
Programming  Series),  1991.  ISBN  0-262-18143-6. 

[6]  D.  Robertson,  M.  Uschold,  A.  Bundy,  and  R.  Muetzelfeldt.  The  eco  program 
construction  system:  Ways  of  increasing  its  representational  power  and  their 
effects  on  the  user  interface.  International  Journal  of  Man  Machine  Studies, 
31:1-26,  1988. 


L 


Finding  Logical  Consequences  Using  Unskolemization 

Ritu  Chadha  and  David  A.  Plaisted 
Department  of  Computer  Science,  University  of  North  Carolina 
Chapel  Hill.  N.  C.  27599-3175.* 

Abstract  :  This  paper  presents  a  method  for  deriving  logical  conse¬ 
quences  of  first-order  formulas  based  on  resolution  and  a  novel  unskolem¬ 
ization  algorithm.  In  general,  it  is  not  possible  to  derive  certain  logical 
consequences  of  a  first-order  formula  by  resolution  without  using  tau¬ 
tologies  or  unskolemization.  Therefore,  if  a  formula  H  implies  a  formula 
W,  we  will  not  attempt  to  derive  W  from  H;  instead,  we  derive  a  formula 
F  such  that  H  implies  F  and  F  implies  W,  and  such  that  F  is  “close” 
to  W.  A  measure  of  closeness  is  defined  such  that  the  number  of  formu¬ 
las  F  “close”  to  any  given  formula  W  is  finite.  A  number  of  interesting 
applications  are  discussed,  including  a  method  for  mechanically  generat¬ 
ing  loop  invariants  for  program  verification,  and  a  technique  for  learning 
characteristic  descriptions  of  objects. 

1.  Objective  and  Motivation 

In  this  paper,  we  will  develop  a  method  for  finding  logical  consequences 
of  first-order  formulas.  Suppose  we  are  given  a  first-order  formula  H,  and  we 
want  to  find  a  certain  consequence  W  of  H,  which  is  unknown.  It  may  not  be 
possible  to  derive  W  from  H  by  resolution  [9]  without  using  tautologies  and 
unskolemization,  as  will  be  shown  in  Section  2.1.  Since  the  use  of  tautologies  is 
undesirable  (due  to  the  enormous  increase  in  search  space  that  it  creates),  we 
will  not  attempt  to  derive  W  from  H,  but  instead  will  try  to  derive  a  formula  F 
with  the  property  that 

H  =>  F  =>  W. 

(where  =>  denotes  logical  implication).  However,  if  this  is  the  only  constraint 
on  F,  then  why  not  take  F  =  H?  One  obvious  reason  is  that  11  may  be  infinite. 
Also,  we  want  F  to  be  ris  “close”  eis  possible  to  W.  To  define  the  concept  of 
“closeness”,  we  will  define  a  relation  “more  general  than”  on  first-order  formulas 
and  will  derive  a  formula  F  from  H  such  that  H  =>  F  =>  W  and  such  that  F 
is  “more  general  than”  W.  The  relation  “more  general  than”  is  defined  in  such 
a  way  that  the  number  of  first-order  formulas  F  which  satisfy  a  given  syntactic 
condition  and  are  more  general  than  a  given  first-order  formula  W  is  finite  up  to 
variants.  Thus  we  can  only  derive  a  finite  number  of  formulas  F  satisfying  both 
the  following  conditions; 

(i)  H  ^  F  =>  W 

(ii)  F  is  more  general  than  W. 

Of  course,  if  H  is  more  general  than  W,  then  we  could  get  F=ll.  We  will 
show  that  this  method  is  complete,  i.e.  that  for  any  two  formulas  H  and  W,  it 
is  possible  to  derive  F  from  H  by  our  method  such  that  (i)  and  (ii)  above  hold. 

This  paper  is  structured  as  follows.  In  the  next  section,  we  show  why- 
certain  logical  consequences  of  first-order  formulas  cannot  be  derived  without 
using  tautologies  or  unskolemization,  and  describe  an  unskolemization  algorithm. 
The  algorithm  is  analyzed  in  Section  3,  where  we  define  the  relation  “more 

*  Ritu  Chadha  may  be  contacted  at  the  following  address  :  Bell  Communications  Researcti, 
MRE  2A-24G,  445  South  Street,  Morristown,  NJ  07962-1910. 


256 


general  than” .  Several  applications  for  the  methods  developed  in  this  paper  are 
described  in  Section  4.  Section  5  concludes  with  a  brief  summary. 

2.  The  Unskolemization  Process 

2.1  Preliminaries 

Unskolemization  has  been  defined  as  the  process  of  eliminating  Skolem  func¬ 
tions  from  a  formula  without  quantifiers,  replacing  them  with  new  existentially 
quantified  variables,  and  transforming  the  resulting  formulainto  a  closed  formula 
with  quantifiers  (for  details  about  skolemization,  see  [2,7]).  McCune  [8]  presents 
an  algorithm  to  solve  the  following  problem  :  given  a  set  S  of  clauses  and  a 
set  F  of  constant  and  function  symbols  that  occur  in  the  clauses  of  S,  obtain 
a  fully  quantified  (closed)  formula  S'  from  S  by  replacing  expressions  starting 
with  symbols  in  F  with  existentially  quantified  variables.  McCune’s  algorithm 
is  sound  but  not  complete.  Cox  and  Pietrzykowski  [3]  present  an  algorithm  for 
unskolemization,  but  their  algorithm  is  applicable  only  to  literals. 

We  expand  the  meaning  of  unskolemization  slightly.  In  our  definition,  func¬ 
tion  symbols  can  also  be  “unskolemized”  by  treating  them  as  if  they  were  Skolem 
functions.  Thus,  a  function  symbol  may  be  replaced  by  an  existentially  quan¬ 
tified  variable  during  unskolemization.  To  illustrate,  suppose  we  want  to  un- 
skolemize  the  formula  Vx(P(/(a;))V(3(<ji(a),x))  where  /  and  a  are  (non-Skolem) 
function  symbols,  and  suppose  we  want  to  treat  /  and  a  as  if  they  were  Skolem 
functions.  The  resulting  formula  would  be  3xVx3j/(P(y)V(5(3(z),x)).  Note  that 
skolemizing  32Vx3y(P(y)  V  Qig{z),  x))  yields  the  original  formula  (up  to  names 
of  Skolem  functions).  In  practice,  the  situation  may  be  more  complicated,  since 
the  formula  being  unskolemized  may  not  be  the  skolemized  form  of  any  formula. 
Our  algorithm  shows  how  to  cope  with  such  situations. 

We  motivate  the  development  of  the  unskolemization  algorithm  by  the  fol¬ 
lowing  example.  Suppose  we  want  to  derive  an  unknown  logical  consequence  B 
of  A.  Denote  the  Skolem  form  of  a  formula  F  by  “Sk(F)” .  Since  A  ^  B,  AA-'B 
is  unsatisfiable,  so  Sk(/lA-'P)  is  unsatisfiable  (since  skolemization  preserves  un¬ 
satisfiability),  i.e.  Sk(/1)A  Sk(-'5)  is  unsatisfiable.  Therefore  we  can  derive  the 
empty  clause  from  Sk(/1)A  Sk(~'B).  Now,  B  is  unknown,  and  we  want  to  derive 
it  from  Sk(T).  It  may  not  be  possible  to  derive  B  from  Sk(>l)  without  using 
tautologies  or  unskolemization,  as  demonstrated  below; 

(i)  Suppose  A  =  P  and  B  =  P\/Q\/  R.  Clearly  A  =>  B.  But  the  only  way  to 
derive  B  from  A  by  resolution  is  by  resolving  A  with  the  tautology  ->P\/ PwQV R. 

(ii)  Suppose  A  =  P(a)  and  B  =  3xP(x).  Then  B  (or  even  Sk(S))  cannot 
be  derived  from  A  by  resolution.  Obtaining  B  from  A  requires  unskolemizing 
A  by  replacing  “a”  by  an  existentially  quantified  variable.  Unskolemizing  P(a) 
results  in  3xP(x).  In  practice,  some  function  symbols  in  A  may  have  to  be 
replaced  by  existential  quantifiers  and  some  may  not.  This  explains  why  our 
unskolemization  algorithm  will  be  nondeterministic. 

In  order  to  address  the  above  issues  formally,  we  present  an  unskolemization 
algorithm  U  with  the  following  specifications; 

INPUT;  a  first-order  formula  H 

OUTPUT;  set  C  of  formulcis  such  that  for  any  logical  consequence  W  of  II, 
algorithm  U  can  produce  a  formula  F  in  C  such  that 

(i)  H  =>  F  =>  W 

(ii)  F  is  more  general  than  W 


257 


where  “more  general  than”  is  a  relation  that  will  be  defined  later  with  the 
property  that  {F  |  F  is  more  general  than  is  finite  upto  variants  under 
certain  syntactic  constraints.  ■ 

The  algorithm  U  unskolemizes  a  set  of  clauses  X>  derived  by  resolution  from 
Sk(H)  to  give  a  set  of  formulas  £.  Briefly,  the  objective  of  unskolemizing  V 
is  to  replace  function  symbols  of  T>  that  do  not  occur  in  W  by  existentially 
quantified  variables.  That  is,  if  for  some  literal  L  in  X>,  an  argument  d  of  L  has 
a  function  symbol  that  does  not  appear  in  W,  then  that  function  symbol  of  d  is 
unskolemized,  yielding  a  set  C  of  new  formulas.  Thus  any  F  £  C  will  contain 
a  new  existentially  quantified  variable  in  place  of  d.  Since  W  is  unknown,  this 
procedure  will  have  to  be  carried  out  nondeterministically.  This  process  will 
make  the  unskolemized  formula  “more  general  than”  W. 

Notes.  1.  The  following  algorithm  makes  use  of  the  guarded  command  for 
conditional  statements  [5].  Briefly,  the  general  form  of  a  conditional  statement 
is  “if  Bi  — +  Si  Q  J32  ^2  •  ••  D  — +  Sn  fi”,  where  n  >  0  and  each  Bi  — ►  Si  is 

a  guarded  command.  Each  Si  can  be  any  statement.  The  command  is  executed 
as  follows.  If  any  guard  Bi  is  not  well-defined,  or  if  none  of  the  guards  is 
true,  abortion  occurs;  if  at  least  one  guard  is  true,  then  one  guarded  command 
Bi  -*  Si  with  true  guard  Bi  is  chosen  and  Si  is  executed.  If  more  than  one 
guard  is  true,  then  one  of  the  guarded  commands  Bi  — ►  Si  with  true  guard  Bj 
is  chosen  arbitrarily  and  Si  is  executed.  Thus  the  execution  of  such  a  statement 
can  be  nondeterministic. 

2.  The  following  notation  is  used; 

(i)  L  =  SIGN(L)  P(ai,  ...,a„)  is  a  literal  whose  sign  (negated  or  unnegated)  is 
represented  by  “SIGN(L)”;  e.g.  SIGN(g(a))=“  ”,  SlGN(-.(?(a))=“-i”. 

(ii)  Let  X  be  a  term.  FUNC(X)  is  defined  to  be  the  function  symbol  of  X  if  X 
is  not  a  variable,  and  is  defined  to  be  X  otherwise.  For  example,  FUNC(/(x,  y)) 
=  /;  FUNC(a)  =  a;  FUNC(x)  =  x,  where  x  is  a  variable. 

2.2  The  Unskolemization  Algorithm 
ALGORITHM  U 

Step  1.  Skolemize  the  input  formula  H .  Let  SK  be  the  set  of  all  the  Skolern 
symbols  in  Sk{H).  Derive  a  set  V  of  clauses  by  resolution  from  Sk{H). 

Step  2.  Make  ik  copies  of  every  clause  Ck  of  P,  where  ik  is  some  integer  (chosen 
nondeterministically),  and  rename  variables  in  all  clauses  so  that  no  two  clauses 
have  any  variable  in  common.  Cali  the  resulting  bag  of  clauses  M-CLAUSES. 
Comment  ;  We  may  need  multiple  copies  of  clauses  because  multiple  instances 
of  a  clause  may  be  needed  to  derive  the  empty  clause  from  Sk(//  A  -iW).  ik  can 
be  bounded  by  the  number  of  resolutions  performed  when  deriving  the  empty 
clause  from  Sk{H  A  -iW).  In  actual  practice,  for  each  k,  we  can  try  setting  ik  to 
1,  then  2,  and  so  on,  and  eventually  ik  will  become  large  enough. 

Step  3.  For  every  literal  L  in  every  clause  of  M-CLAUSES,  process  the  argu¬ 
ments  of  L  as  follows.  Suppose  L=SIGN(L)  P(di,  d2>  f^j)  For  each  i,  1  <  i  <  s, 
perform  the  following; 

if  (FUNC(d;)GSK)  — »  replace  by  X  «—  di,  for  some  fresh  variable  X; 

[]  (FUNC(dj)^SK  A  di  is  not  a  variable)  — ‘ 

replace  di  by  X  *—  di,  for  some  fresh  variable  X; 

[]  (FUNC(di)ySK  A  di  is  not  a  variable)  — *  skip; 

Q  (di  is  a  variable)  — ►  skip 


258 


Call  the  resulting  set  of  processed  clauses  MARK. 

Comment  :  Replacing  d,  by  X  ol;  is  just  a  way  of  marking  d;  with  a  variable 
name.  Any  argument  of  the  form  “X  d,”  is  called  a  marked  argument. 

Step  4.  For  every  pair  of  marked  arguments  “X  <—  a”,  “Y  *—  /?”  in  MARK  do 
if  a,  l3  are  unifiable  — *  unify  all  occurrences  of  X  and  Y; 

[j  a,  /3  are  unifiable  — »  skip; 

[]  a,  /3  are  not  unifiable  — »  skip 
fi 

Comment  :  In  the  next  step,  C  is  the  set  of  constraints  on  the  ordering  of  new 
existential  quantifiers  relative  to  universal  quantifiers  which  will  be  introduced 
in  Step  6.  The  presence  of  an  ordered  pair  (y,  ^)  in  C  signifies  that  “3r”  must 
come  after  “Vy”  in  the  quantifier  string  of  the  unskolemized  formula. 

Step  5.  Let  C  be  a  set  which  is  initially  empty,  and  let  Q  be  an  initially  empty 
quantifier  string.  Let  FREE  be  the  set  of  all  free  variables  in  MARK  (this  does 
not  include  marked  arguments).  For  every  marked  argument  “x  *—  a”  do 

{Collect  all  marked  arguments  with  the  same  variable  on  the  left-hand 
side  of  the  ”  sign.  Suppose  these  are 

X  <—01,1  *—  a2,  ■■.,x  a„. 

bet  {yi,  yz,  •••1 2/r}  be  the  set  of  ail  the  variables  occurring  in  01,02!  •••! 
On-  Then  replace  “x  <—  o,”,  for  1  <  i  <  n,  everywhere  by  a  new  varia*^!e 
2  (say)  and  add  the  r  ordered  pairs  (y«,2)  to  C.  If  r  =  0,  place  “Bz”  at 
the  head  of  the  partially  completed  quantifier  string  Q.} 

Step  6.  (i)  For  every  y  in  FREE,  define  DEP(y)  =  {2|(y,  2)  GC).  This  is  the 
set  of  all  variables  2  such  that  “Bz”  must  come  after  “Vy” .  Define  the  partial 
order  PO  on  the  set  FREE  as  follows:  (x,y)  €  PO  iff  DEP(x)  D  DEP(y). 

(ii)  Extend  the  partial  order  PO  to  a  linear  order  on  FREE  in  all  possible  ways, 
yielding  a  set  LIN  of  linear  orders. 

(iii)  Let  QUANT  be  an  initially  empty  set  of  quantifier  strings.  For  every  linear 
order  O  in  LIN  do 

{P  ■•=  Q; 

add  universal  quantifiers  for  every  variable  in  FREE  to  P  in  the  order 
prescribed  by  O  (i.e.  if  x  <  y  in  O,  then  Vx  precedes  Vy  in  P).  These 
quantifiers  come  after  any  existential  quantifiers  already  present  in  Q; 
QUANT  :=  QUANT  U  {P}  } 

(iv)  Let  £  be  an  initially  empty  set.  For  every  Q  in  QUANT  do 

{Insert  an  existential  quantifier  for  every  2  such  that  (y,  2)  6  C  (for 
some  y)  as  far  forward  in  Q  as  possible,  subject  to  the  constraint  that 
“Bz”  comes  after  “Vy”  for  every  y  such  that  (y,  z)  €  C; 

Rewrite  MARK  in  conjunctive  normal  form; 

Add  the  formula  “Q  MARK”  to  the  set  £.  }  ■ 

Example  1.  Let 

H  =  VxVyV2V«;Vt{(<5(y)  V  L(h,y,t))  A  -■Q(y(t))  A  L{g{t),a,t)  A  (R(x,y(t))  V 
-^F(x,  y(<)))A  z)  V  z))), 

W  —  Vs3uVu(£(6,  M,  s)  A  £(u,  a,  s)  A  {-^P(v,  u)  V  ~'D{v,  u)  V  M{a))). 

It  is  easy  to  see  that  H  =>  W.  We  will  show  how  algorithm  U  derives  a 
formula  F  from  H  such  that  H  F  =>  W.  We  have, 

-^W  =  3sVu3t;((-i£(6,  u,  s)  V  -<L{u,  a,  s)  V  P(v, «))  A  {-•L{b,  u,  s)  V  -’£(«,  a,  s)  V 
D{v,u))  A  (-'£(6,  u,s)V  -’L(u,a,s)  V->Af(a))) 


©  ©  o  © 

(QiyhUb.y.t)}  {->Q(g(‘)} 


indicates  resolutions 

pairs  literals  of  ©  and  Sk(  iW)  which  are  resolved  against  each  other 


Figure  1.  Derivation  of  the  empty  clause  by  resolution  for  Example  1 


Sk(-’M^)  =  {{-'L(6,w,c),->L(u,a,c),P(/(«),ti)}  ,  {->1(6,  u,  c), -<Z,(u,  a,  c), 
D{f(u),u)),  {-'L(6,w,c),  a, c),  -iAf(a)}}. 

In  Sk(-iVV'),  /  and  c  are  Skolem  functions  replacing  the  existentially  quan¬ 
tified  variables  v  and  s  of  -iIV,  respectively.  A  sequence  of  resolutions  that 
derives  the  empty  clause  from  Sk(//)ASk(-'l^)  is  depicted  pictorially  in  Figure 
1.  Notice  that  resolutions  among  clauses  of  Sk(/f)  were  performed  first  and  then 
some  of  these  clauses  were  used  during  the  remainder  of  the  resolution  process. 
We  will  define  the  set  X>cSk(//)  to  consist  of  the  three  clauses  {{i(6, gr(<), f)}, 
{L(g{t),a,  <)},  {--‘P[x,g{t)),  ->0(1,  which  were  obtained  from  Sk(H)  and 

were  used  to  derive  the  empty  clause  from  Sk(/f)  A  Sk(-'W).  These  clauses  are 
enclosed  in  boxes  in  Figure  1. 

INPUT  :  The  formula  H  given  above. 

Step  1;  We  define  the  set  V  to  consist  of  the  following  three  clauses,  as  explained 
above:  { {L{t,  ^(0,  0}.  3(0).  3(0)}  }  • 

Step  2:  Since  the  clauses  {L{b,g{t),t)}  and  {L{g{t),a,t)}  are  both  used  twice 
during  the  resolution  in  Figure  1,  we  make  two  copies  each  of  these  clauses,  and 
make  one  copy  of  the  clause  {-iP{x,g(t)),-<D{x,g{t))].  We  now  have  the  bag 
of  clauses  M_CLAUSES  consisting  of  the  following  five  clauses:  { {L(6,  0}, 

{L(6,sr(t),<)),  {L(g(t),a,t)},  {L(g{i),a,t)},  {->P{x,  g{t)),-^D{x,  g{t))}} . 

Step  3:  Now  “mark”  arguments  of  M-CLAUSES  as  follows.  Look  at  which 
literal  of  Sk(-’IT)  each  of  the  above  literals  resolves  against  in  Figure  1,  Pairs 
of  literals  of  V  and  Sk(-’W)  that  resolve  against  each  other  are  linked  by  dot¬ 
ted  lines  in  the  figure.  We  see  that  L(b,g{t),t)  resolves  against  -^L(b,w,c)\ 
^^(*^.3(0.0  resolves  against  ->L{b,  tz,  c)-,  L{g{t),a,t)  resolves  against  -'.L(u>,  a,  c); 
f'(3(0.‘*.0  resolves  against  -'L(u,a,c);  -•P{x,g[t))  resolves  against  P(/(u),u); 
and  -yD{x,g{t))  resolves  against  D(/(w),w). 


260 


Note;  By  looking  at  (for  instance)  the  resolution  between  clauses  3  and  11,  which 
yields  clause  13,  it  appears  that  L{g{t),a,t)  (from  clause  3)  resolves  against 
--<L{g{c),a,c)  (from  clause  11).  However,  the  literal  ~>L{g{c),a,c)  in  clause  11  is 
an  instance  of  the  literal  ->L(w,a,c)  in  clause  7.  Clause  7  belongs  to  Sk(->iy) 
(and  clause  11  doesn’t).  Thus  the  literal  in  Sk(-'W)  that  L{g(t),a,t)  resolves 
against  is  ->L(w,  a,  c)  in  this  case. 

For  any  function  symbol  F  in  a  literal  of  M-CLAUSES  which  resolves  against 
a  variable  X  in  Sk(-ilF),  mark  it  by  replacing  F  by  “X  ♦—  F”.  This  yields  the 
following  set  of  marked  clauses  MARK:  { {L(6,  tu  <—  {L(b,  u  «—  g(t),t)}, 

{L(vj  <-  g{t),a,t)},  {L(u  *-  git),a,t)},  {->P(x,u*-  g{t)),^D(x,w  ^  g(t)))} . 
Step  4:  We  unify  some  of  the  variables  on  the  left  hand  side  of  the  ”  in 
marked  arguments.  To  decide  which  variables  will  be  unified,  we  look  at  variables 
in  unmarked  arguments  of  MARK.  There  are  two  such  variables,  namely  x  and 
t.  These  variables  were  unified  with  {/(«), /(tu)}  and  {e}  respectively  (see  the 
preceding  analysis).  Since  x  was  unified  with  both  /(«)  and  f(w),  we  unify  f{u) 
with  f{w).  MARK  now  contains  the  five  clauses  {L(b,u  «—  g(t),i)},  {i(6,u  »— 
S(0.0}.  —  3U),a,t)),  {L(u  >-  g(t),a,t)),  {-^P(x,u  ^  g(t)),-^D(x,u  ^ 

git))}.  Since  the  first  and  second  clauses  are  identical,  and  so  are  the  third 
and  fourth  clauses,  we  can  drop  the  duplicate  clauses.  MARK  now  consists 
of  the  three  clauses  ;  {{L(6,u  <—  fl(t)i<)}>  9{t),^>t)},  {~'P{x,u  •— 

g{t)),-^Dix,u  *-g(t))}}. 

Step  5:  Here  FREE  =  {t,  x}.  The  marked  arguments  in  MARK  are  “u  <—  g(iY' . 
We  replace  these  arguments  by  a  fresh  variable  Z,  and  add  the  pair  (t,Z)  to 
C.  This  yields  C  =  {{t,Z)}  and  MARK  consists  of  the  clauses  {{L{b,  Z,t)}, 
{LiZ,a,t)],  {-nP(x,Z),-D(x,Z)}}. 

Step  6;  (i)  Here  DEP(<)  =  {Z},  DEP(x)  =  {  }. 

Since  DEP(t)  D  DEP(x),  PO  =  {(<,x)}. 

(ii)  The  partial  order  PO  is  a  linear  order  on  FREE;  thus  LIN  =  (PO). 

(iii)  QUANT  =  {VtVx} 

(iv)  The  existential  quantifier  for  Z  must  be  placed  after  Vt,  as  far  forward  as 
possible;  thus  QUANT  =  {Vt3ZVx},  and  the  resulting  set  of  formulas  is 

£  =  {Vt3ZVx(L(6,Z,t)  A  L(Z,a,t)  A(-P(x,Z)V-£'(x,Z)))}. 

Thus  there  is  only  one  formula  in  the  set  £  for  this  example;  call  it  F.  It 
can  easily  be  verified  that  H  F  =>  W.  ■ 

3.  Analysis  of  the  Unskolemization  Algorithm 

This  section  defines  the  “more  general  than”  relation  and  lists  .some  the¬ 
orems  which  prove  that  algorithm  U  satisfies  its  specification.  Proofs  for  the 
theorems  in  this  section  can  be  found  in  [1], 

Theorem  1.  Let  H ,  W  he  formulais  such  that  H  ^  W.  If  the  right  non 
deterministic  choices  are  made,  given  H  as  input,  algorithm  U  produces  a  set 
£  of  formulas  such  that  for  any  formula  F  in  £,  for  any  literal  L  =  SIGN(L) 
P(di,  (^2,  •1  dj)  ofSk(F),  there  exists  a  literal  M  of  W  such  that  M  =  SIGN(M) 
P{b\,b2,  ...,Aj),  SIGN(M)  =  SIGN(L),  and  such  that  for  all  i,  I  <  >  <  s, 

(i)  If  di  is  a  Skoleni  function,  then  6,  is  existentially  quantified  in  W. 

(ii)  If  dj  is  a  non-Skolem  function,  then  one  of  the  following  holds; 

(a)  bi  is  the  same  function,  and  (i)  and  (ii)  here  hold  recursively  for  each 
corresponding  argument  of  d,  and  bi. 


261 


(b)  hi  is  existentially  quantified  and  the  function  symbol  of  d,  (with  the 
same  arity  as  di)  appears  in  W.  ■ 

Intuitively,  we  are  trying  to  say  that  Skolem  symbols  in  V  (where  P  is  as 
specified  in  Step  1  of  algorithm  U)  are  replaced  by  existentially  quantified  vari¬ 
ables  in  W  ((i)  in  the  theorem  statement),  and  non-Skolem  function  symbols  in 
2?  which  do  not  appear  in  W  are  also  replaced  by  existentially  quantified  vari¬ 
ables.  Thus  any  non-Skolem  function  symbol  that  remains  in  an  unskolemized 
formula  F  must  appear  somewhere  in  W  ((ii)  in  the  theorem  statement).  This 
is  crucial  because  it  allows  us  to  define  a  relation  “more  general  than”  such 
that  the  number  of  formulas  more  general  than  a  given  formula  is  finite  under 
certain  syntactic  constraints.  Also,  the  fact  that  every  literal  L  in  Sk(F)  has 
a  corresponding  literal  M  in  W  with  the  same  predicate,  arity,  and  sign  shows 
that  formula  F  is  similar  to  W  in  the  predicates  that  it  contains. 

Motivated  by  Theorem  1,  we  introduce  the  following  definition. 

Definition.  A  formula  F  is  more  general  than  a  formula  W  if  for  every  literal 
L  of  F,  there  exists  a  literal  M  of  VF  such  that  if  L  =  SIGN(L)P(oi,a2,  • -.aj), 
then  M  =  SIGN(M)P(6i,62,  ...,6,),  where  SIGN(L)  =  SIGN(M),  and  for  all  i 
such  that  1  <  f  <  5, 

(i)  If  Oi  is  an  existentially  quantified  variable,  then  so  is  6^. 

(ii)  If  Oj  is  a  function  symbol  followed  by  u  arguments  ei ,  e2, ....  £„,  then 

(a)  bi  is  the  same  function  symbol  followed  by  the  same  number 
of  arguments,  say  /i,/2,  ..-./a,  and  conditions  (i)  and  (ii)  hold  for 
every  pair  of  arguments  ejt  and  /t,  1  <  ik  <  u,  or 

(b)  6,  is  an  existentially  quantified  variable  and  a,  has  a  function 
symbol  that  occurs  in  W.  ■ 

Note  the  similarity  between  the  statement  of  Theorem  1  and  this  defini¬ 
tion.  The  definition  of  “more  general  than”  given  above  does  not  allow  function 
symbols  that  do  not  appear  in  W  to  appear  in  F  if  F  is  more  general  than  W. 

Corollary  to  Theorem  1.  For  every  F  G  £,  F  is  more  general  than  W . 

Definition.  Let  F,  W  be  two  first-order  formulas.  We  say  that  F  :<  W  iff 

(i)  F  is  more  general  than  W 

(ii)  F  =>  W. 

Example  3.  We  illustrate  the  meaning  of  “more  general  than”  . 

(i)  F  =  Vx3t/Vc(P(x,  2/)  A  Q{y,  z)),  W  =  Vu3uP(u,  v). 

F  is  not  more  general  than  W  because  for  the  literal  Q{y,z)  in  F,  there 
is  no  literal  in  W  with  the  specified  properties,  since  W  does  not  e’/en  have  a 
literal  with  predicate  symbol  Q. 

(ii)  F  =  Vx3j/Vz(P(x,  y)  A  Q{y,  z)),  W  =  Vw3u(P(u,  v)  A  Q(v,  v)). 

Here  F  is  more  general  than  W,  because  for  P(x,  y)  in  F,  there  is  a  corre¬ 
sponding  literal  P(u,  n)  in  W  with  the  specified  properties;  similarly,  for  Q(y,  z) 
in  F,  there  is  a  corresponding  literal  Q(t;,i;)  in  W  with  the  specified  properties. 
Also,  F  =>  W;  therefore  F  ■<  W. 

(iii)  F  =  Vx3?/Vr(P(x,  y)  V  Q(y,  z)),  W  =  'du3v{P{u,  v)  A  Q{v,  v). 

As  in  (ii),  F  is  more  general  than  W.  However,  F  ^  W;  therefore  F  ^  W.  ■ 

Theorem  2.  (FjF  <  W}  is  finite  up  to  variants,  assuming  that  if  F  is  written 
in  conjunctive  normal  form,  then  no  two  disjunctions  of  F  are  identical,  and  no 
disjunction  of  F  contains  more  than  one  occurrence  of  the  same  literal.  ■ 


262 


Theorem  3.  For  every  F  £C,  H  =>  F,  where  C  is  the  set  of  formulas  obtained 
by  unskolemizing  a  formula  H  according  to  algorithm  U.  ■ 

Theorem  4.  Given  formulas  H ,  W  such  that  H  =>  W,  there  exists  F  G  £ 
such  that  F  ^  W,  where  H  and  £  are  the  input  and  output  of  algorithm  U 
respectively.  ■ 

Corollary  to  Theorem  4:  There  exists  F  €  £  such  that  F  W.  ■ 

4.  Applications 

The  unskolemization  algorithm  discussed  in  this  paper  was  originally  de¬ 
veloped  as  part  of  a  mechanism  for  deriving  loop  invariants  for  program  loops, 
with  a  view  to  automatically  proving  the  partial  correctness  of  programs.  In 
Floyd’s  inductive  assertions  method  for  proving  the  partial  correctness  of  pro¬ 
grams  [4],  the  user  is  required  to  supply  loop  invariants  for  every  loop  in  the 
program.  Although  some  attempts  have  been  made  in  the  past  to  mechanize 
the  derivation  of  loop  invariants,  most  of  the  methods  developed  are  heuristic 
in  nature  (for  two  basic  heuristic  approaches,  see  [6,10]).  We  can  describe  the 
problem  of  generating  loop  invariants  as  one  of  generating  logical  consequences 
of  an  infinite  number  of  formulas.  To  j'c  this,  suppose  that  W  is  a  loop  invariant 
for  a  program  loop  at  the  entry  to  the  loop.  Let  Ai  be  the  formula  which  holds 
before  the  iteration  of  the  loop.  Then  Ai  ^  W  for  all  i  such  that  i  >  1. 
We  have,  Ai  =»  W,  Aq  ^  W,  As  ^  W,  ...,  and  so  on,  i.e.  W  is  a  logical 
consequence  of  each  Ai,  for  all  i  such  that  i  >  1.  Thus  the  method  described  in 
this  paper  for  deriving  logical  consequences  can  be  used  for  deriving  W. 


As  a  quick  and  simple  example,  consider  the  program  in  Figure  2,  which 
multiplies  two  numbers  by  successive  addition.  Suppose  we  are  trying  to  find  a 
loop  invariant  for  the  loop  of  this  program,  which  would  always  be  true  at  point 
B.  The  input  and  output  sissertions  for  this  program  are  attached  at  points  A 
and  C  respectively.  Using  the  preceding  terminology,  if  W  is  a  loop  invariant  for 
the  program  at  point  B,  then  the  formulas  Ai,A2,  ...,  are; 

A I  =  6>0Ax  =  0Ar/  =  6 
A2=b>0/\x  =  aAy  =  b— I 
A3  =  b>0Ax  =  2*aAy  =  b  — 2 
/l4=6>0Ax  =  3*aAy  =  6  —  3 


263 


and  so  on.  Let  us  unskolemize  A4  by  replacing  the  (non-Skolem)  symbol  “3”  by 
an  existentially  quantified  variable.  We  get 

3m{b  >OAx  =  Tn*aAy=b  —  m) 

which  is  a  loop  invariant  for  this  loop  which  can  be  used  to  prove  the  partial 
correctness  of  this  program.  We  hope  that  the  simplicity  of  this  example  does 
not  obscure  its  intent,  which  is  to  illustrate  the  need  for  unskolemization  in 
situations  like  the  above.  The  algorithm  which  we  have  developed  for  deriving 
loop  invariants  provides  guidance  to  the  unskolemization  algorithm  for  deciding 
which  symbols  should  be  unskolemized.  A  complete  description  of  this  algorithm 
can  be  found  in  [1].  We  have  developed  a  complete  algorithm  for  deriving  loop 
invariants  within  the  framework  of  first-order  logic;  in  other  words,  if  a  loop 
invariant  can  be  expressed  in  f  it-order  logic,  then  our  algorithm  can  generate  a 
loop  invariant,  using  resolution  and  unskolemization,  which  can  be  used  to  prove 
the  partial  correctness  of  the  program.  A  proof  of  this  is  included  in  [1].  This 
proof  rests  on  the  crucial  fact  that  the  set  {F|F  W}  is  finite  as  described  in 
Theorem  2;  thus  if  an  algorithm  for  deriving  loop  invariants  continually  generates 
approximations  F  for  a  loop  invariant  W  such  that  F  X  W,  then  since  only  a 
finite  number  of  such  F’s  exist,  the  algorithm  converges  and  finds  a  suitable  loop 
invariant. 

The  method  for  deriving  logical  consequences  described  here  has  also  been 
applied  to  learning  characteristic  descriptions  from  examples.  A  characteristic 
description  is  a  description  of  a  collection  of  objects,  situations  or  events  which 
states  facts  that  are  true  of  all  objects  in  the  class.  More  formally,  a  statement 
S  is  a  description  of  objects  01,02:03.  •••  if  Oi  =>  5,  Oo  =>•  S’,  O3  =>  S,  ... 
and  so  on.  This  provides  a  straightforward  application  of  the  method  described 
in  this  paper,  since  the  statement  S  to  be  derived  is  a  logical  consequence  of 
OijOo.Oa, ...  .  Another  very  simple  example  to  illustrate  the  use  of  unskolem¬ 
ization  is  the  following:  consider  the  two  sets  of  objects  depicted  in  Figure  3. 
The  objects  in  each  set  can  be  described  in  first-order  logic  as: 

SETi  =  blank{a)  A  smaH{a)  A  sqiiar€(a)  A  blank{b)  A  large{b)  A  circle{b) 

SET2  =  blank(c)  A  small(c]  A  circle{c)  A  shaded{d)  A  large(d)  A  circte(d) 


SETI  SET  2 


Figure  3.  Learning  object  descriptions 


Our  learning  algorithm  then  u.ses  a  maximum-matching  algorithm  for  bipartite 
graphs  to  determine  which  clauses  and  symbols  should  be  unskolemized.  Tiie 
resulting  formula,  in  this  case,  is: 

3x3y(blank{x}  A  S7nall(x)  A  large{y)  A  circle(y)) 

In  other  words,  in  each  set  of  objects,  there  is  a  small  blank  object  and  a  large 
circle.  A  description  of  a  learning  algorithm  which  makes  use  of  the  unskolem¬ 
ization  algorithm  is  given  in  [1].  This  learning  algorithm  builds  on  the  work  of 
this  paper,  since  it  provides  a  means  of  eliminating  some  of  the  nondetermin¬ 
ism  of  the  unskolemization  algorithm  by  finding  structural  similarities  between 
examples. 

Finally,  we  have  applied  our  method  for  deriving  logical  consequences  to 


264 


the  mechanization  of  mathematical  induction  in  first-order  logic  theorem  provers. 
The  principle  of  mathematical  induction  cannot  be  expressed  in  first-order  logic; 
therefore,  in  order  to  enable  first-order  logic  theorem  provers  to  prove  theorems 
requiring  the  use  of  the  induction  principle,  it  is  necessary  to  provide  the  provers 
with  the  necessary  inductive  hypotheses  which  they  will  need  in  order  to  prove  a 
theorem.  These  hypothe&cs  -an  be  derived  cis  negations  of  logical  consequences  of 
AXIOMS  A-iT,  where  AXIO’  IS  is  a  set  of  axioms  for  the  domain  involved,  and  T 
is  the  theorem  to  be  proved.  'I'hus  our  method  for  deriving  logical  consequences 
can  be  used  for  this  t£isk.  I  he  method  is  described  in  [1]. 

5.  Summary 

This  paper  has  presented  a  method  for  deriving  logical  consequences  of 
first-order  formulas  based  on  resolution  and  unskolemizafion.  A  number  of  ap¬ 
plications  (such  as  those  described  in  the  previous  section)  require  the  generation 
of  logical  consequences  of  first-order  formulas.  It  was  shown  that  it  is  impossible 
to  generate  all  possible  logical  consequences  of  a  first-order  formula  without  us¬ 
ing  tautologies  or  unskolemization.  We  therefore  presented  an  unskolemization 
algorithm  that,  given  a  first-order  formula  II,  can  derive  formulas  F  which  are 
“close”  to  logical  consequences  W  of  11.  A  measure  of  closeness  is  defined  using 
a  “more  general  than”  relation.  The  suitability  of  out  meeisure  is  demonstrated 
by  proving  that  only  a  finite  number  of  formulas  are  “close”  to  any  particular 
formula,  by  our  definition.  We  would  like  to  point  out  that  although  the  algo¬ 
rithm  contains  some  nondeterminism,  much  of  this  nondeleriniiiism  is  removed 
when  the  algorithm  is  u.sed  for  the  applications  described  in  Section  4.  The 
nondeterminisrn  of  the  algorithm  gives  it  the  generality  to  he  taken  and  applied 
in  a  variety  of  different  ways. 

References 

1.  II.  Cliadha:  Apphcations  of  Unskon  dtzahon.  Fil  l)  dissertation.  Dept,  of 
Computer  Science,  Univ.  of  North  Carolina,  Chapel  Hill  (1901). 

C.  Chang,  H.C.  Lee:  Syruboltc  I.ogir  and  Mechanical  'Iheorrm  Proiing,  .\ca- 
demic  Fre.ss  Inc..  New  'I’ork  (1973). 

.3.  F.d'.  Cox.  T  Fiet rzykowi-ki:  \  complete,  nonn'duiidant  algDriihm  for  re¬ 
versed  skolemization.  Theoretical  ( 'ornpiiter  Scieir  e,  J,S.  "ddO-'iljl  (lOSl). 

4.  R.W.  Floyd:  Assigning  meanings  to  programs.  Froc  Syinp  on  Applied 
Mathematics.  American  .Mat lii'iiiai iral  .Society.  19,  19-32  {19(i7). 

5.  I).  Cries:  'The  Science  of  Froytannumg  Springer-Xerlag  (1981) 

G.  S.M.  Katz,  Z.  Manna:  .A  heuristic  a|>proarh  to  program  vi'rilicatioii.  Froc. 
Third  Inti.  .Joint  Conf  on  .Artijicial  Intellig'-nce,  -nOO  ■712  (1973). 

7.  D.  Loveland:  Automated  I  hioretn  Froving  .  .1  Logical  Bases.  Nortll-llolland 
Fublishing  Co.  (1978). 

8.  W  W.  MeCtine.  U n-Skolemizing  I  lanst'  sets  Information  Froccssing  l.tt- 
le-.,  2.77-2G3  (1988). 

9.  J.A.  Robinson:  Macliine-oriented  Logic  b.a-sed  on  the  Resolution  Frinciple 
.Journal  of  'he  ACM.  /2(1),  23  1 1  (F.Hi.n). 

10.  B.  Wegbreil:  Heuristic  Methods  for  Xlecliamcally  Deriving  Inductive  .Asser 
tions.  Froc.  Third  Inti  .hunt  ('onf  on  Artificial  Intt  lltgi  nee  (1973) 


Controlled  Explanation  Systems 


Arcot  Rajasekar 

Department  of  Computer  Science,  University  of  Kentucky,  Lexington,  KY  40506 


Abstract.  In  this  paper  we  extend  the  definition  of  explanations  to  con¬ 
trolled  explanations.  Traditionally,  given  a  set  of  facts  and  a  theory,  a 
set  of  explanations  are  generated  which  can  be  used  to  infer  the  facts. 
When  the  number  of  such  explanations  are  more  than  one,  then  some 
criteria  of  minimality  or  preference  is  adapted  to  select  some  of  the  ex¬ 
planations.  Most  of  the.se  selection  criteria  are  syntactic  ba.scd  and  are 
domain-independent.  In  this  paper,  we  define  a  system,  where  the  selec¬ 
tion  can  be  made  using  some  domain-dependent  criteria.  We  motivate 
and  define  controlled  explanations,  show  some  of  their  properties  and 
provide  a  procedure  which  generates  minimal  controlled  explanations. 


1  Introduction 

Explanation  systems  form  the  heart  of  several  artificial  intelligence  systems  such 
as  truth  maitilonance  systems  [3,  1],  abductive  reasoning  systems  [7],  diagnostic 
systems  [8,  2]  atid  expert  systems.  The  underlying  theme  of  all  these  systems  is 
to  explain  a  .set  of  facts  using  a  subset  of  a  given  theory.  In  most  ca.ses,  such 
as  circuit -fault  diagnosis,  it  is  normal  to  use  the  explanation  system  to  explain 
an  abnormal  set  of  observations  in  terms  of  defective  (or  abnormal)  parts  in  the 
system.  In  some  cases,  such  as  abductive  reasoning  used  in  medical  diagnosis,  the 
aim  is  to  find  a  set  of  diseases  which  can  explain  an  observed  set  of  symptoms. 
In  some  other  cases,  such  as  a  calculus  tutoring  system,  it  is  required  to  find  all 
methods  of  solving  a  given  problem.  In  either  case,  one  normaP'.-  is  interested 
not  just  in  every  explanatioti  po.ssible,  the  number  may  be  overwhelming,  but 
only  in  a  small  set  of  explanations  which  have  some  preference  property.  There 
are  two  distitict  methods  used  in  selecting  explanations.  The  first  method  of 
preference  is  based  on  some  probabilistic  criteria  such  as  confidence  factors.  The 
second  method  of  selection  is  bcised  on  set-theoretic  criteria  such  as  cardinality- 
minimal  explanations  (as  in  generalized  set  covering  (CCS)  explanations  [7])  or 
set-minimal  explanations  (as  in  minimal  diagnosis  of  [8]).  Both  the.se  methods 
have  their  advantages  and  disadvantages;  the  probabilistic  preference  systems 
|)rovide  semantic  selection,  but  they  have  problems  with  transitivity  and  in  pro¬ 
viding  numeric  confidence  factors.  The  set-theoretic  criteria  are  syntactic  and 
do  not  use  any  semantic  information  for  ordering  their  explanations;  the  user  is 
left  to  cluKxse  among  the  final  s<’t. 

In  this  pa]>er,  we  are  interested  in  semantic-based  domain-dependent  criteria 
wliicli  can  be  used  in  conjunction  with  .set-th<>oretic  criteria  to  reduce  tin'  num¬ 
ber  of  explanat  i<.>ns  that  are  generated.  This  semantic  information  which  controls 
the  production  of  explanations  is  provided  by  the  user  in  addition  to  the  farts 


266 


tlial  need  to  be  explained.  The  semantic  information  is  either  about  what  should 
be  part  of  an  eiplanaiton  or  about  what  cannot  be  part  of  an  explanation.  The 
first  type  can  be  termed  as  coercion  and  the  second  as  resinclion.  Both  of  them 
control  the  explanation  generation  process.  The  controlled  explanation  approach 
can  be  illustrated  as  follows:  Consider  that  a  doctor  is  examining  a  patient  and 
finds  the  symptonrs  of  headache  and  runny  nose.  He  also  finds  that  the  pa¬ 
tient  does  not  have  any  chest  congestion.  Now  there  arc  a  number  of  causes 
for  headache  and  runny  nose;  common-cold,  migraine,  influenza,  sinusitis,  pneu¬ 
monia,  etc.  Of  these  only  migraine  and  sinus  headaches  do  not  have  any  chest 
congestion  which  can  lead  to  a  runny  nose*.  Hence,  the  doctor  would  factor  in 
the  fact  that  the  patient  does  not  have  any  chest  congestion  in  his  analysis  to 
eliminate  all  other  explanations,  except  for  migraine  and  sinus.  Logically,  one 
can  look  at  the  situation  as  follows.  There  aue  three  rules  for  runny  noses  (in 
our  databases)  and  one  of  them  is  inapplicable.  Hence  any  explanation  based  on 
that  rule  is  bound  to  be  wrong  and  should  not  be  generated.  Hence  the  data  of 
no  chest  congestion  is  a  resinciion  condition  to  reduce  our  set  of  explanations. 
Further,  assume  that  the  patient  says  that  the  headache  is  throbbing.  This  ad¬ 
ditional  factor  can  be  used  by  the  doctor  to  prefer  an  explanation  which  uses 
this  fact  over  others:  possibly  sinusitis  over  migraine.  Note  that  the  fact  of  the 
headache  being  throbbing  is  not  an  important  one,  i.e.  not  important  enough  to 
form  part  of  ihe  symptoms  list,  but  if  the  explanation  uses  this  fart  then  that 
explanation  is  all  the  more  preferable.  Logically,  throbbing  headache  is  a  kind 
of  headache.  Just  using  throbbing  headache  instead  of  headache  as  part  of  the 
symptoms  might  shut  out  a  lot  of  explanations,  (for  example,  a  headache  due 
to  influenza  may  or  may  not  be  throbbing,  and  using  the  symptom  throbbing 
headache  may  shut  out  this  diagnosis,  even  if  it  is  supported  by  other  symp¬ 
toms).  Hence  using  throbbing  headache  to  prefer  a  particular  diagnosis  can  be 
seen  as  the  use  of  coercion  to  find  a  preferable  diagnosis. 

In  this  paper  we  develop  a  syntax  and  semantics  for  controlled  explanations. 
We  restrict  our  attention  to  the  restriction  type  of  control  and  provide  a  proof 
procedure  for  obtaining  such  explanations.  The  extension  of  the  system  to  co¬ 
ercion  type  control  is  discussed  elsewhere  [6].  in  the  next  section  we  develop 
the  syntax  and  semantics  for  restriction-controlled  explanations.  In  Section  3, 
we  provide  a  proof  procedure  for  obtaining  such  explanations.  We  conclude  our 
paper  with  a  discussion  in  Section  4. 


2  Controlled  Explanation  -  Definition  and  Properties 

In  this  paper  we  develop  a  theory  behind  controlled  explanation  based  on  the 
logic  programming  framework.  Hut  the  approach  is  general  enough  that  it  can 
be  adapted  for  other  types  of  explanation  systems  such  as  cause-effect  systems 
[7],  default  theory  based  systems  [.')],  and  other  diagnostic  systems  [8,  2j. 

'  runny  nose  in  migraine  headache  is  probably  caused  by  watering  of  the  eyes.  I'he 
.same  in  sinus  may  tie  due  to  disc  harges. 


We  assume  tliat  tlie  tlieory  T  wliicli  is  the  basis  for  generating  explana¬ 
tions  is  made  of  normal  Horn  clauses  [4]  (with  negation  in  the  body,  which 
are  treated  as  a  non-classical  negation).  We  also  assume  that  the  reasoning 
mechanism  uses  first  order  inference  axioms  augmented  with  some  meta-rule  for 
treating  negation  (eg.  the  closed  world  assumption  (or  one  its  extended  forms) 
may  be  an  appropriate  mechanism  for  treating  negation.)  The  consequences  of 
T  under  so’.ne  reasoning  model  (mechanism)  72  is  denoted  as  CnniT).  That  is, 
<^ttR(7')  =:  {m  .  T  m},  where  the  mechanism  depends  upon  the  under¬ 
lying  theory  and  axiom  schema  being  used.  If  C  is  propositional,  then  the  axioms 
of  propositional  calculus  provides  one  such  mechanism  In  this  paper,  since  we  are 
concerned  with  theories  which  are  made  of  Horn  programs,  the  mechanism  of 
is  given  by  the  declarative  semantics  of  various  classes  of  logic  programs,  such 
as  the  least  model  semantics  for  Horn  programs,  supported  model  for  stratified 
programs,  perfect  model  semantics  for  locally  stratified  programs,  stable  model 
semantics  or  well-founded  semantics  for  normal  logic  programs.  The  definition  of 
a  controlled  explanation  is  general  enough  to  apply  for  many  types  of  theories. 

We  also  have  a  notion  opposite  to  that  of  consequences;  that  of  non  sequtturs. 
We  denote  non  sequilurs  of  T  under  72  by  Nsit{T).  That  is,  Ns-r{T)  =  {m  : 
T  »n}.  The  twin  concepts  of  consequences  and  non-sequiturs  provide  a  ba¬ 
sis  for  developing  a  theory  for  explanations.  We  first  require  the  notion  of  an 
inference  pair.  An  inference  pair  is  a  two-tuple,  <  s,u  >,  where  s  (resp.  u)  is 
a  subset  of  £  and  is  called  the  consequence  part  (resp.  non-sequitur  part)  that 
needs  to  be  explained.  Any  explanation  5  for  the  inference  pair  <  s,u  >  is 
a  subset  of  a  larger  theory  T  such  that  s  is  in  Cnfi{S)  and  u  is  in  Nsfi{S). 
In  the  case  of  normal  Horn  programs,  Cuti^T)  consists  of  a  sets  of  atoms  and 
NsTi(T)  consist  of  set  of  atoms  and  negated  atoms.  Whenever  one  needs  to  ex¬ 
plain  a  phenomenon  or  observation  (say  p),  then  p  should  be  in  the  consequence 
part  of  the  inference  pair  and  whenever  one  needs  the  explanation  to  refrain 
from  using  another  phenomenon  (say  q)  in  the  explanation,  then  q  should  be 
in  the  non-sequitur  part  of  the  inference  pair.  f"or  example,  the  patient  with 
the  headache  example  of  the  previous  section,  we  can  use  the  inference  pair 
<  {headache .runiiynosc} ,  {chest. congestion]  >. 

The  non-sequitur  part  of  an  inference  pair  serves  an  important  purpose.  The 
inclusion  of  ^  as  a  non-sequitur  is  very  different  in  semantics  to  -<q  being  part  of 
the  consequence.  The  later  requires  that  there  be  an  explanation  for  ->9  whereas 
the  former  implies  that  a  given  explanation  should  not  infer  q.  Consider  that 
the  doctor  notices  that  a  patient  does  not  have  a  high  body  temperature  and 
he  or  she  includes  non-high  body  temperature  in  the  consequence  set  to  be 
explained.  This  implies  that  the  doctor  needs  an  explicit  explanation  for  the 
temperature  being  not  high,  which  may  be  impossible  to  explain  (since  there 
are  too  many  cases  where  the  temperature  need  not  be  high).  Including  high 
body  temperature  as  a  non-sequitur  avoids  this  need  to  explain  non-high  body 
temperature  but  making  sure  that  high  body  temperature  is  not  inferred  from  the 
diagnosis.  The  concept  of  non-s<>quitur  is  helpful  in  two  ways:  first,  in  avoiding 
explanation  of  the  obvious  as  in  the  above  example,  and  second,  in  constraining 


268 


explanations.  For  example,  if  one  wants  to  find  a  path  from  point  a  to  point 
6,  but  not  through  point  c,  then  having  path(a,b)  as  part  of  consequence  and 
path(a,c)  as  part  of  non-sequitur  constrains  any  plan  (or  explanation)  generated 
to  avoid  going  through  c. 

The  consequence  and  the  non-sequitur  parts  of  an  inference  pair  <  s,u  > 
have  semantically  different  meaning  with  respect  to  a  theory  T.  Let  S  be  an 
explanation  for  <  s,u  >.  Whenever  p  €  s,  then  the  explanation  for  p  in  .‘i' 
should  be  consistent  with  T.  That  is,  if  S  infers  p  then  T  also  should  infer  p. 
Moreover,  if  S  uses  a  line  of  reasoning’  to  infer  p,  then  the  same  line  of  reasoning 
should  also  be  usable  in  T  to  explain  p.  On  the  other  hand,  whenever  q  £  u,  S 
does  not  infer  q,  and  it  is  possible  that  T  infers  q.  That  is,  the  lack  of  inference 
for  q  in  S  need  not  be  consistent  with  T.  In  the  example  about  path  from  point 
a  to  point  b,  it  is  possible  that  there  are  paths  from  a  to  r  in  the  graph  under 
consideration,  but  our  plan  (explanation)  does  not  contain  those  paths. 

Next,  we  make  precise  the  definition  of  an  explanation  of  an  inference  pair. 
(We  do  not  include  the  subscript  R  in  ^)  We  begin  with  the  definition  of  a 
weak  explanation.  A  weak  explanation  for  an  inference  pair  (s,  u),  can  be  seen  as 
any  subset  of  T  which  explains  s  and  does  not  explain  u,  but  any  other  positive 
inferences  made  from  the  explanation  may  not  be  consistent  with  theory  T. 

Definition  1  A  weak  explanation  e  for  an  inference  pair  (s,  u)  from  a  theory  T 
i.s  defined  as  follows: 

1)  e  C  T, 

2Jc^s. 

3)  'db  £  u,  e  1^  6, 

4)  Vfc((e  u  k  s)  ^  {T  k))  where  k  is  a  set  of  atoms. 

The  set  of  weak  explanations  for  a  pair  (s,u)  from  a  theory  T  is  denoted  as 
El(s,u). 

When  E^(s,u)  is  empty  it  implies  that  the  pair  (s,u)  is  not  explainable  from  T. 
Item  ( 1)  states  that  explanations  are  going  to  be  formed  from  the  initial  theory. 
That  is  no  other  external  facts  are  part  of  an  explanation.  Items  (2)  and  (3) 
states  that  the  explanations  should  conform  to  what  it  explains.  That  is,  the 
explanation  should  derive  s  and  should  not  derive  u.  Item  (4)  states  that  e  is 
a  consistent  explanation  for  s  from  the  underlying  theory  T.  Items  (1)  and  (4) 
are  vital  and  make  sure  that  the  explanation  for  s  is  consistent  with  T.  Item  (4) 
makes  sure  that  whenever  a  negation  of  an  atom  needs  to  hold  in  c  for  the  proof 
of  s.  then  that  negation  also  holds  in  T.  Item  (1)  makes  sure  that  axioms  used 
in  the  derivation  of  s  occur  in  T  also.  Items  (1)  and  (4)  together  make  sure  that 
the  derivation  of  s  in  e  can  be  mimicked  in  T.  The  purpose  of  (4)  can  be  seen 
from  the  following  example  and  Lenuna  1. 

Exaiupio  1  Consider  a  theory  T  ~  {a  —  b,->c-,  a  <—  6, -id;  k  <—  -ic;  6;  c} 
and  the  pair  {{a},  0).  Consider  that  the  reasoning  model  is  given  as  in  logic  pro¬ 
gramming.  i.e.,  SLDNF-resolution.  Then  if  (4)  »•*  ignored  in  Definition  1  then 
we  can  sec  {n  <—  6,-ic;  6}  is  a  possible  explanation  for  {{a) , 9).  but  it  is  incon¬ 
sistent  with  T.  since  there  is  no  SLDNF-resolution  fora  using  a  <—  b.-'c  and  b. 


269 


since  c  IS  provable  in  T.  By  including  Item  (4)  of  the  derivation,  this  inconsistent 
explanation  is  avoided. 

Note  tliat  the  definition  of  weak  explanation  does  not  avoid  inferring  other 
information  which  may  be  inconsistent  with  T.  In  the  above  example,  {a  »— 
b,->d\  k  ^  -ic;  6}  is  a  weak  explanation  for  ({a},0).  The  atom  k  is  derivable 
from  the  explanation  using  SLDNF'resolution,  whereais  k  is  not  similarly  deriv¬ 
able  from  T.  The  derivation  of  inconsistent  results  from  an  explanation  can  be 
avoided  using  the  following  definition  of  a  strong  explanation. 

Defiiiitioii  2  A  strong  explanation  e  for  an  inference  pair  (s,u)  from  a  theory 
T  is  defined  as  follows: 

Items  I)  through  4)  given  in  Definition  1 

5)  V/(e  [=  /  =>  7  <)  where  t  is  any  arbitrary  set  of  positive  inferences. 

The  set  of  strong  explanations  for  a  pair(s,u)  from  a  theory  T  is  denoted  as 
Ej(s,u). 

The  strengthening  by  Item  (5)  makes  the  definition  of  strong  explanation  biased 
towards  positive  inferences  and  against  negation.  Even  though  a  strong  expla¬ 
nation  has  an  attractive  property  that  all  positive  inferences  made  from  it  are 
consistent  with  T,  the  construction  of  all  strong  explanations  is  more  compli¬ 
cated  compared  to  construction  of  all  weak  explanations.  But,  in  practical  cases, 
we  are  interested  in  constructing  only  a  subset  of  the  explanations  and  particu¬ 
larly  minimal  explanations,  and  such  explanations  are  strong.  Note  that  a  strong 
explanation  is  also  a  weak  explanation  but  not  vice  versa.  Next  we  study  some 
properties  of  weak  explanations  which  are  also  inherited  by  strong  explanation. 
The  proofs  of  these  properties  can  be  found  in  [6]. 

Lemma  1  If  c  is  a  weak  explanation  for  an  inference  pair  (s,w)  from  a  theory 
T  then  T  \=  s. 

When  a  theory  T  is  consistent  then,  if  there  is  an  explanation  for  a  fact  from  T, 
then  there  is  no  explanation  for  its  negation.  (W’e  define  =  {-<C  \  C  6  s}.) 

Lemma  2  Let  T  be  a  consistent  theory  and  let  (s,u)  be  a  pair. 

Then.  El(s,u)  /  0  =>  El(-^s,tb)  =  0. 

Any  explanation  for  a  given  inference  pair  should  not  be  affected  by  reducing 
theory  T  to  a  limited  extent.  Also  whenever  an  inference  pair  is  expanded  to  a 
limited  extent,  it  should  not  affect  the  explanation.  These  robustness  properties 
of  an  explanation  are  captured  by  the  following  lemma: 

Lemma  3  .•  Let  T  be  a  theory.  Let  (s,u)  be  a  pair  and  e  E  E^.ls.u).  Then  the 
following  hold: 

1)  If  eC  V  C  T  then  e  €  El'{s,u). 

2)  If  f  a  then  e  E  £’^(s,«U  {a}). 

Note  that  c  1=  t  does  not  imply  c  E  fJfsUt,  u).  But  for  strong  explanations  we 
lal.ly  have: 


270 


Lemma  4  :  Let  7’  be  a  theory.  Let  (s,u)  be  a  pair  and  e  €  E'J{s,u).  If  e  \=  t 
then  e  £  Ej (s  U  /,  u). 

Next  we  derive  some  useful  results  concerning  the  non-existence  of  an  explana¬ 
tion,  In  the  case  of  explanations  which  do  not  have  the  non-sequitur  counterpart 
(as  in  traditional  explanations),  if  T  proves  the  set  of  facts  that  need  to  be  ex¬ 
plained  then  there  is  an  explanation  for  the  set  of  facts.  This  does  not  hold  in  the 
case  of  controlled  explanation,  since  if  every  explanation  from  T  is  constrained 
by  some  atom  in  the  non-sequitur  then  there  can  be  no  controlled  explanation 
possible. 

Lemma  5  .  Let  T  be  a  theory  and  let  (s,  u)  be  a  pair.  Then 
El(s,  u)  =  0  o  (V7’'  C  T,  (r  1=  s  ^  36  6  u(T'  [=  {6}))) 

The  following  shews  when  there  is  no  effect  of  the  the  constraining  action  of  u. 

Corollary  1  If  T  ^  b,  '^/b  £  u  then 

1) {T^s=>  El(s,u)f:it) 

2)  (T^s^T€  Elis,u)  and  re  Ej(s,u) 

3  Selection  of  Explanations 

Even  though  the  use  of  constraints  restricts  the  number  of  explanations  that 
are  generated,  one  can  still  arrive  at  more  than  one  explanation  for  a  given 
phenomenon.  In  such  cases,  we  adapt  the  techniques  which  are  traditionally  used 
for  ordering  the  explanations  and  preferring  some  over  others.  In  this  section, 
we  define  two  such  types  of  explanations,  set-minimal  controlled  explanations 
and  cardinality-minimal  controlled  explanations.  In  [6]  we  develop  other  types 
of  selection  criteria. 

One  of  the  main  issues  in  explanation  generation  is  that  of  generating  min¬ 
imal  explanations;  that  is  explanations  which  use  the  smallest  possible  extent 
of  the  theory  to  explain  an  inference  pair.  The  rationale  behind  the  criteria  of 
minimality  is  that  one  prefers  the  simplest  explanation  possible;  the  principle  of 
“Occam’s  Razor".  There  are  several  ways  of  defining  minimality.  We  start  will, 
one  such  definition: 

Definition  3  Let  T  be  a  theory  and  let  (s,u)  be  an  inference  pair.  Lei  t  e 
£j(s,u).  Thent  IS  called  a  set-minimal  explanation  for  (s,u)  using  T  if  there 
are  no  other  erplanaiion  I'  €  £’J(s,u)  such  that  t'  C  t. 

For  a  given  inference  pair,  there  can  be  more  than  one  set-minimal  explana¬ 
tion  and  we  refer  to  the  set  of  set-minimal  explanations  for  (s,u)  using  T  as 
‘t  fan  be  seen  that  E]’^^„(s,u)  C  £'^(s,u) 

Example  2  Consider  the  theory  T  given  as 
(!)  paveinent-wei  •—  rained 
(2)  grass-wel  <—  rained 
(  i)  gras.s.wet  <—  sprinklcruon 


USA 


271 


{4J  rained 
fSJ  sprinkler, on 

We  want  to  explain  the  inference  pair  (s,u)  =  {{grass,wtt)  ,ill)  using  T.  The 
following  explanations  are  sei-mintmai 
=  {(2), (4)} 

^2  =  {(3), (5)} 

Whereas,  the  following  explanation  is  not  set-minimal: 
f3=  {(1),(2),(4)} 

In  fact,  there  are  I4  explanations  for  (s,u)  using  T,  but  only  ej  and  e^  are 
set-mmimal. 

Necessary  and  suflicient  condition  for  the  non-existence  of  a  set-niininial  con¬ 
trolled  explanation  is  provided  by  the  following  lemma. 

Lciunia  6  Let  T  be  theory  and  let  (s,u)  be  an  inference  pair. 

Then  t  u)  =11)  if  and  only  if  EI.(s,u)  =  0. 

The  following  theorem  shows  that  a  set-minimal  explanation  is  also  a  strong 
explanation. 

Theorem  1  Let  T  be  a  theory  and  lei  (s,  u)  be  an  inference  pair. 

Ifeq  EJ„,„(s,u)  then  e  e  Ejis.u). 

Next  we  provide  the  definition  for  cardinality-minimal  controlled  explanation. 

Dehuitiuii  4  LetT  be  theory,  lei{s,u)  be  an  inference  pair  and  left  G  E^{s,n). 
Then  t  ts  called  a  size-minimal  explanation  fer  cardinality-minimal  explanation^ 
for  (s,u)  using  T  if  there  are  no  other  explanation  t'  G  E^(s,u)  such  that 

\n<\ii 

For  a  given  inference  pair,  there  can  be  more  than  one  size-minimal  explana¬ 
tion  and  we  refer  to  the  set  of  size-minimal  (or  cardinality-minimal)  explanations 
for  (s,u)  using  T  as  In  Example  2,  the  explanations  ei  and  62  are 

also  size-minimal  explanations  for  (s,  u)  using  T,  since  all  other  explanations 
have  a  higher  cardinality.  The  following  lemma  shows  the  relationship  between 
.set-minimal  and  size-minimal  explanations. 

Lemma  7  Let  T  be  a  theory  and  let  (s,  w)  be  an  inference  pair. 

Then,  Ej„,„,(s,u)  C 

4  Minimal  Explanation  Systems 

In  the  previous  sections  we  saw  the  definition  for  weak,  strong,  set-minimal  and 
cardinality  minimal  controlled  explanations.  In  this  section,  we  provide  a  pro¬ 
cedure  for  generating  .set-minimal  controlled  explanations.  In  [6]  we  provide  a 
procedure  for  generating  weak  explanations  and  cardinality-minimal  explana¬ 
tions. 


272 


We  restrict  our  attention  in  this  section  to  propositional  Horn  theories.  I  his 
restriction  can  be  lifted  without  affecting  the  soundness  and  completeness  of  the 
procedure.  But  the  simplification  helps  in  keeping  our  algorithms  and  proofs 
simple  without  any  substitutions  or  unifiers  being  involved  in  them.  VVe  also 
consider  that  T  is  body-minimal  which  is  defined  as  follows; 

Dofiiiitiun  5  Lei  T  be  a  propositional  Horn  theory.  Then,  T  is  body-mniimal. 
if  there  are  no  two  rules  Ri  =  j4  < —  Bj , . . . ,  and  R2  —  A'  C\. in 
T  such  that  A  —  A'  and  { fli , . . . ,  Hn }  C  mi¬ 

ff  a  theory  is  not  body-minimal,  it  can  easily  be  converted  into  one,  by  deleting 
appropriate  rules.  In  the  rest  of  the  paper,  we  consider  only  theories  that  are 
body-minimal  propositional  Horn  theories.  We  begin  with  a  notion  of  an  exp- 
goal. 

Defiiiitiun  6  An  exp-goal  is  a  function  of  the  form  exp(T,s,u,t)  where  T  is  a 
body-minimal  propositional  Horn  theory,  t  Q  T.  s  is  a  set  of  literals,  u  is  a  set  of 
atoms  and  .sDu  =  0.  When  s  =  <t  then  the  exp-goal  is  called  an  empty  exp-goal. 

An  esmin-derivation  constructs  a  minimal  explanation  for  a  given  inference  pair. 
It  operates  on  exp-goals. 

Definition  7  Let  T  be  a  body-minimal  propositional  Horn  theory,  (s,u)  an  in¬ 
ference  pair.  Then  an  esmin-derivation  for  (s,u)  from  T  is  a  sequence  of  exp- 
goals  G'o  =  exp(T,s,  n,0),6’i, . . .  where  each  is  derived  from  G,  (fori  >  Of, 
as  follows:  Let  Gi  =  exp(T,su{L],u,t)  be  an  exp-goal  where  L  is  a  literal,  ealled 
the  selected  literal.  Then  the  exp-goal  Gi+i  is  e-denved  from  G,  as  follows: 

Case  1.  L  =  A  is  an  atom  and  A  *—  Bj, - B„  is  a  rule  in  T  such  that 

Vi,  1  <  I  <  n,  (B,  ^  u  A  B;  /  AA  C\  . . .  ,Cr  E:  <))•' 

G’j+i  =  exp(T,sU  {Bi, . . . ,  B„},u,<  U  {A  —  Bj, . . . ,  B,,}) 

Case  2:  L  —  -'B  and  every  esmin-derivation  from  exp{T,  {B},0.0)  fails; 
Gi+i  =  exp(T,s,u,t) 

Definition  8  Let  T  be  a  body-minimal  propositional  Horn  theory  and  let  (s.  u) 
be  an  inference  pair.  Then  a  failed  esmin-derivation  from  Gy  =  ej:p(T,  s,  u,  0)  is 
an  esmin-dcnvation  which  ends  in  an  exp-goal  Gj  —  exp{T,  s' ,u' ,t)  ichere  s'  is 
non-empty  and  where  neither  of  the  two  cases  is  applicable  to  the  selected  literal. 

Deifinition  9  Let  T  be  a  body-minimal  propositional  Horn  theory  and  let  (s.u) 
an  inference  pair.  Then  an  esmin-derived  explanation  t  for  (s,u)  from  T  is 
given  by  a  finite  c-dcrivation  of  an  empty  exp-goal  exp{T.0,u,t)  from  G'o  = 
exp(T,  s,u,ti).  If  Gn  =  exp{T,d,u,t)  we  say  that  the  derivation  has  length  n. 

The  explanation  generated  by  Definition  7  is  sound  and  complete  with  respect 
to  the  set-minimal  explanation  defined  in  Definition  d.  The  soundness  is  shown 
by  first  showing  that  a  esmin-refutation  provides  a  weak-explanation  and  then 
Then  by  showing  that  the  explanation  derived  is  minimal.  The  proofs  for  the 
following  theorems  are  given  in  [6]. 


Theorem  2  (Soundness  Theorem)  Let  T  be  a  body-mtmmal  propositional  Horn 
theory  and  let  (s,u)  be  an  inference  pair.  Lett  be  an  esmin-derived  explanation 
Thent^  El(s,u). 

Theorem  3  (Minimality  Theorem)  LetT  be  a  body-minimal  propositional  Horn 
theory  and  let  (s,  u)  be  an  inference  pair.  Let  t  be  an  esmtn-denved  explanation 

Theorem  4  Let  T  be  a  body-minimal  propositional  Horn  theory,  and  let 
be  an  inference  pair.  Let  t  €  Then  there  is  an  esmin-denvation 

Tjo  —  exp(T,  s,  u,  0), . . . ,  =  exp(7’,  0,  u,t])  for  some  finite  n. 

5  Discussion 

Tlie  rontrollecl  explanation  defined  in  this  paper  provides  an  additional  method 
for  restricting  explanations  generated  by  a  system.  One  is  interested  in  finding 
what  is  the  effect  of  the  constraint  on  the  explanations;  i.e,,  the  effect  of  u 
in  the  inference  pair  <  s,  u  >  on  the  generation  of  strong,  weak  and  minimal 
explanations.  The  following  lemma  holds. 

Lemma  8  Let  T  be  a  theory  and  Let  (s,  u)  be  a  pair.  Then 

1)  El(s,u)C  El{s,ld). 

2)  Ej(s,u)C  Ejis,(d). 

The  property  does  not  hold  for  either  set-minimal  or  size-minimal  explanations. 

I  he  ^ason  being  that  one  can  define  a  set  u  such  that  every  minimal  explanation 
^5min(*’®)  >s  not  in  £7niin(*'^)-  Hencc,  the  set  of  set-minimal  controlled 
explanation  need  not  be  a  subset  of  uncontrolled  set-minimal  explanations.  The 
same  also  holds  for  size-minimal  explanations. 

The  following  figure  shows  the  relationship  between  various  types  of  expla¬ 
nations  defined  in  this  paper. 

f  ^  tS/v) 

ill  /tXVM  1 

I  I  el  C5,^) 


Aeknowledgement:  We  wish  to  express  our  appreciation  to  the  National 
Science  foundation  for  their  support  of  our  work  under  grant  number  CCR- 
91 107*21. 


274 


Appendix 

Consider  the  theory  T  given  as 

(1)  pavernerit.ivet  ^  rained 

(2)  pavement, wet  —  overflow 

(3)  grass-wet  —  rained 

(4)  grass.wet  ^  sprinkler.on,tap_on 

(5)  rained  ^  cloudy.earlier 

(6)  tap,on  •—  -^tap.off 

(7)  overflow  sprinkler, on 

(8)  cloudy ,earlier 

(9)  sprinklerjon 

The  above  theory  provides  rules  for  pavement  and  grass  being  wet.  No  in¬ 
formation  about  the  tap  being  off  is  provided  in  the  theory  and  hence  can 
be  assumed  to  be  false  by  the  closed  world  assumption.  Now  consider  that 
you  want  an  explanation  for  the  grass  and  the  pavement  being  wet.  Also  as¬ 
sume  that  you  have  noticed  that  it  was  not  cloudy  earlier  in  the  day  Hence 
you  want  to  circumscribe  your  explanations  not  to  include  this  information  as 
part  of  the  explanation.  This  can  be  done  using  the  in'"  fence  pair  (s,  u)  =< 
{grass.wet, pavemeni.wet},  {cloudy.earlier}  >  which  needs  to  be  explaini'd  us¬ 
ing  7’.  The  following  is  an  esrnin-derivation  providing  a  sct-ininiinr.l  explanation. 
Co  =  exp(T.  {grass.wet ,  pave  merit. wet} ,  {cloudy.earlier}  ,9). 

G[  =  exp(T,  {pavement. wet,  spr  inkier. on,  t  ap.on} ,  {cloudy.earlier} .  {(4)}) 

G-y  =  erp{T,  {overflow,  sprinkler.on,tapjon} ,  {cloudy.earlier} ,  {(4),  (2)}) 

7/3  =  erp(T,  {sprinkler.on,  lap.on},  {cloudy. earlier} ,  {(4),  (2).  (7)} ) 

G4  =  exp(T.  {sprinkler. on,  -<tap.of  f},  {cloudy jeerlier} ,  {(4),  (‘2),  (7),  (6)} ) 

G'o  =  exp(T,  {tap.off},9.<d) 

failure  since  all  cases  in  Definition  7  are  inapplicable  and  Sq  ^  0- 
65  =  exp(T.  {sprinkler .on} ,  {cloudy. earlier} ,  {(4),  (2),  (7),  (6)}) 

Ge  =  exp(T,i,  {c/oudy.ear/j'er},  {(4),(2),(7),(6),(9)}) 

Hence  {(4),  (2),  (7),  (b),  (9)}  is  a  minimal  explanation  for  the  query. 


References 

1.  J.  de  Kleer.  An  Assumption-based  TMS.  Artificial  Intelligence,  2^-.\27  1^2,  1986. 

2.  J.  de  Kleer  and  B.C.  Williams.  Diagnosing  Multiple  Faults.  Artificial  Intelligence, 
32:97-130,  1987. 

3.  .1.  Doyle.  Truth  Maintenance  System.  Artificial  Intelligence,  13,  1980. 

4.  J.W'.  Lloyd.  Foundations  of  Logic  Programming.  Springer- Verlag,  second  edition, 
198’!’. 

.5.  D.  Poole.  A  Logic  [or  Default  Reasoning.  Artificial  Intelligence,  36:27-47,  1988, 

6.  A.  Rajasekar.  Semantics  of  Explanation  Systems,  1992.  Submitted, 

7.  J.A.  Reggia,  D,S,  Nau,  and  Y,  Wang.  Diagnostic  Expert  Systems  based  on  a  Set- 
Covering  Model.  Int.  J.  on  Man-Machine  Studies,  19:437  -460,  1983. 

8.  R.  Reiter.  A  theory  of  diagnosis  from  first  principles.  Technical  report,  Computer 
Science  Department  University  of  Toronto,  1986. 


Signed  Formulas:  A  Liftable  Meta-Logic 
for  Multiple-Valued  Logics  ^ 


Neil  V.  Murray  and 

Dept  of  Computer  Science 
State  Univ.  of  N.Y.  at  Albany 
Albany,  NY  12222 
nvm@cs.albany.edu 


Erik  Rosenthal 
Dept,  of  Mathematics 
University  of  New  Haven 
West  Haven,  CT  06516 
brodsky%nhu.UUCP@yale.edu 


ABSTRACT.  We  consider  means  for  adapting  classical  deduction  techniques  to 
multiple-valued  logics.  Some  recent  work  in  this  area,  including  our  own,  utilizes  signs 
(subsets  of  the  set  of  truth  values).  In  this  paper  we  develop  a  language  of  signed  for¬ 
mulas  that  may  be  interpreted  as  a  meu-ievel  logic.  Questions  not  expressible  in  the 
underlying  logic  are  easily  expressed  in  the  meta  logic,  and  they  may  be  answered  with 
classical  techniques  because  the  logic  is  classical  in  nature. 

We  illustrate  the  applicability  of  classical  techniques  by  specifically  developing  resolu¬ 
tion  for  signed  formulas.  The  meta  logic  admits  a  version  of  Herbrand's  Theorem;  as  a 
result,  these  results  extend  naturally  to  the  first  order  case.  The  fact  that  path  resolu¬ 
tion,  path  dissolution,  and  analytic  tableaux  are  easily  adapted  to  signed  formulas,  and 
tliat  annotated  logics  are  a  special  case  of  signed  formulas  is  briefly  discussed. 


1.  Introduction 

In  recent  years  a  number  of  researchers  have  studied  a  variety  of  non-standard  logics 
([1,2,3,4,5,7,8,9,10,14,15,16,17,18,221,  for  example),  largely  as  tools  for  investigat¬ 
ing  areas  such  as  modeling  uncertainty  or  natural  language  processing.  These  appli¬ 
cations  require  computational  deductive  techniques.  One  that  several  authors  have 
adapted  is  resolution  [2,4,9,10,15,16,22];  others  [1,3,5,7,8,14,18]  have  found  the 
tableau  method  suitable,  largely  because  it  can  produce  lists  of  models.  Our  path 
dissolution  rule  [12,13],  which  generalizes  the  tableau  method,  also  produces  lists  of 
models,  and  we  have  been  able  to  adapt  it  to  a  class  of  multiple-valued  logics 
(MVL’s).  In  this  paper  we  describe  methods  for  applying  these  techniques  to  a  much 
wider  class  of  logics. 

One  feature  common  to  much  of  this  work,  including  our  own,  is  the  use  of 
signs  (subsets  of  the  set  of  truth  values).  An  adaptation  of  standard  tableau  signs  for 
MVL’s  was  first  proposed  by  Surma  [21]  at  die  Int.  Symp.  on  Multi-Valued  Logics 
in  1974  and  by  Suchon  [20]  that  same  year.  In  1987  Camielli  extended  Surma’s 
work  to  a  wider  class  of  logics  [1].  In  1990  Doherty  [3]  developed  a  tableau  system 
for  a  variant  of  a  three-valued  logic  developed  by  Sandpwall  [17].  The  notion  of  a 
set  of  truth  values  as  a  sign  is  present  in  both  Suchon’s  and  Doherty’s  work,  but 
only  implicitly.  The  explicit  and  formal  development  of  sets-as-signs  was  introduced 
independently  by  Hahnle  [7,8]  and  by  the  authors  [14].  (Gabbay’s  labeled  deductive 
systems  [6]  are  quite  related  philosophically,  but  not  technically.) 


^  This  research  was  supported  in  part  by  National  Science  Foundation  grants  CCR-910I208  and 
CCR-92020n. 


276 


This  approach  allows  ihe  utilization  of  classical  techniques  for  the  analysis  of 
non-standard  logics.  This  is  appealing  since,  at  the  meta- level,  human  r  soning  is 
essentially  classical.  That  is,  regardless  of  the  set  of  truth  values  associated  with  the 
logic,  at  die  meta-level  humans  interpret  statements  about  the  logic  to  be  cither  true 
or  false.  Consider,  for  example,  a  logic  with  three  truth  values,  (O.Vz.l).  We  may 
wish  to  determine  whether  a  formula  5  is  saiisfiable;  i.e.,  whether  it  can  evaluate  to 

1.  Alternatively,  one  may  be  interested  in  whether  9  can  evaluate  to  other  than 
false,  i.e.,  to  ‘A  or  1.  These  queries  are  captured  by  the  signed  formulas  { 1  );5^  and 
(‘A,  1}:^.  The  answer  to  such  queries  is  yes  or  no;  that  is,  either  the  formula  can  so 
evaluate  or  not.  Observe  that  both  the  queries  and  the  answers  are  at  the  meta- level; 
observe  also  that  these  questions  cannot  even  be  formulated  at  the  object-level. 

When  the  underlying  logic  is  classical,  the  distinction  between  meta-  and 
object-levels  becomes  blurred  because  both  employ  what  would  appear  to  be  the 
same  set  of  truth  values.  Of  course,  true  at  the  meta-level  is  a  positive  ansv;cr  to  a 
query  about  theoremhood  or  satisfiability,  whereas,  at  the  object-level,  true  is  jusi 
one  element  of  the  boolean  domain. 

In  this  paper,  we  formalize  the  notion  of  signed  formulas  and  their  relationship 
to  the  base  logic  from  which  they  arc  built.  Given  an  MVL  A,  we  inuoduce  a  new 
logic  A,  whose  atoms  are  signed  formulas.  The  advantage  of  working  with  A^  is 
twofold:  First,  a  variety  of  meta-level  questions  about  those  truth  values  to  which 
formulas  in  A  can  evaluate  are  directly  expressible  in  Secondly,  many  standard 
inference  techniques  can  easily  be  adapted  to  apply  to  signed  formulas. 

In  the  next  section  the  bioad  class  of  MVL’s  to  be  considered  is  formally 
defined,  and  a  formal  basis  is  provided  for  A*,  the  language  of  signed  formulas.  We 
also  show  how  Aj  extends  naturally  to  the  first  order  case.  In  Section  3,  we  show 
how  a  very  general  version  of  the  standard  resolution  rule  applies  to  signed  formu¬ 
las.  We  conclude  by  summerizing  how  the  annotated  logics  of  da  Costa,  Henschen, 
Lu,  and  Subrahmanian  (2)  and  of  Kifer  and  Subrahmanian  (9),  may  be  regarded  as 
special  cases  of  signed  formulas.  We  also  indicate  how  other  techniques  such  as  the 
tableau  method  [191  and  path  dissolution  1 12,13)  can  be  adapted  to  A,. 

2.  Multiple-Valued  Logics  and  Signed  Formulas 

We  assume  a  (propositional)  language  A  consisting  of  logical  formulas  built  in  die 
usual  way  from  a  set  A  of  atoms,  a  set  F  of  connectives,  and  a  set  x  of  logical  con¬ 
stants.  For  the  sake  of  completeness,  we  precisely  define  a  formula  in  A  as  follows: 

1 .  Atoms  are  formulas. 

2.  If  0  is  a  connective  of  arity  n  and  if  9 1,  ^2,  •  ■  ,  are  formulas,  then  so  is 

0(^1,  5^2,  ••• 

Associated  with  A  is  a  set  A  of  truth  values,  and  an  interpretation  for  A  is  a 
function  from  A  to  A;  i.e.,  an  assignment  of  truth  values  to  every  atom  in  A.  A  con¬ 
nective  0  of  arity  n  denotes  a  function  0:A"  -^A.  Intcrprciaiions  arc  extended  in 
the  usual  way  to  mappings  from  formulas  to  A.  Alternatively,  a  formula  of  A  can 
be  thought  of  as  denoting  a  mapping  from  interpretations  to  A, 

Many  authors  are  concerned  with  logics  in  which  there  is  considerably  more 
structure.  For  example,  a  common  requirement  is  that  the  domain  of  truth  values  A 
be  a  complete  distributive  lattice  in  which  0  =  glb(A)  and  1  =  lub(A),  and  that  the  the 
meet  and  join  operations  play  the  roles  of  conjunction  and  disjunction,  respectively. 
The  results  in  this  paper  do  not  depend  on  any  such  rcsu-ictions. 

We  use  the  term  sign  for  any  (expression  that  denotes  a)  subset  of  A.  We 
define  a  signed  formula  to  be  an  expression  of  the  form  S:^,  where  S  is  a  sign  and  9 
is  a  formula  in  A.  We  are  interested  in  signed  formulas  becau.se  they  represent 
queries  of  the  form,  "Are  there  intcrprciaiions  under  which  9  evaluates  to  a  truth 


277 


value  in  S?"  In  a  refutaiional  theorem  proving  seiung  for  classical  logic,  the  query 
is  typically  [une}:^.  where  5  is  the  negation  of  a  goal  or  conclusion,  conjoined  with 
some  axioms  and  hypotheses;  the  answer  hoped  for  is  no.  To  answer  arbitrary' 
queries,  we  map  formulas  in  A  to  formulas  in  a  classical  propositional  logic  . 

We  call  A,  the  language  of  signed  formulas  and  define  it  as  follows:  The 
atoms  are  signed  formulas  and  the  connectives  are  (classical)  conjunction  and  dis¬ 
junction.  We  emphasize  that  a  signed  formula  S;^  is  an  atom  in  A^  regardless  of  the 
size  or  complexity  of  9  and  thus  has  no  component  parts  in  the  language  A^.  The 
set  of  truth  values  is  of  course  (true,  false). 

2.1.  A-Consistent  Interpretations 

An  arbitrary  interpretation  for  A,  may  make  an  assignment  of  true  or  false  to  any 
signed  formula  (i.e.,  to  any  atom)  in  the  usual  way.  Our  goal  is  to  focus  attention 
only  on  those  interpretations  that  relate  to  the  sign  in  a  signed  formula.  To  accom¬ 
plish  this  we  define  a  A-consistent  interpretation  for  A^  to  be  an  interpretation  for 
which  there  exists  an  interpretation  I  for  A  such  that  for  each  atom  S.S’,  S:9  is  true 
under  I,  if  and  only  if  the  formula  5  in  A  is  mapped  into  S  by  1.  Intuitively,  A- 
consistent  means  an  assignment  of  true  to  all  signed  formulas  whose  signs  arc  simul¬ 
taneously  achievable  via  some  interpretation  over  the  original  language.  If  and 
^2  are  formulas  in  A,,  we  write  ^2  *f  whenever  1^  is  a  A-consistent  interpre¬ 

tation  and  =  uuc,  then  1,(92)  =  ifue. 

The  following  lemma  is  immediate  since  each  interpretation  in  A  maps  a  for¬ 
mula  to  exactly  one  element  of  A. 

Lemma  1.  Let  1*  be  a  A-consisient  interpretation,  let  A  be  an  atom  and  9  a 
formula  in  A,  and  let  Si  and  S2  be  signs.  Then; 

i)  1,(0: 9)  =  false; 

ii)  Is(A:5')  =  true; 

iii)  Si  c  S2  if  and  only  if  S  1  K\  S2:5*  for  all  formulas 

iv)  There  is  exactly  one  6  e  A  such  that  Is({6)  ;  A)  =  mue.  □ 

Although  we  feel  that  the  focus  on  A-consistent  interpretations  is  intuitive,  it 
can  also  be  motivated  by  the  following  technical  observation:  The  formulas  under 
consideration  in  A,  do  not  have  any  occurrences  of  negation;  only  A  and  v  appear 
as  classical  interpreted  symbols.  Whether  or  not  such  formulas  arc  saiisfiablc  with 
respect  to  A-consistent  interpretations,  they  arc  trivially  saiisfiablc  with  respect  to 
arbitrary  interpretations. 


2.2.  A-atomic  formulas 

Many  classical  inference  rules  begin  with  links  (complementary  pairs  of  literals). 
Such  rules  typically  deal  only  with  formulas  in  which  all  negations  are  at  the  atomic 
level.  Similarly,  the  inference  techniques  that  we  wish  to  develop  here  require  that 
signs  be  at  the  "atomic  level.”  To  that  end,  we  call  a  formula  A-atomic  if  it  has  the 
property  that  whenever  S;A  is  an  atom  in  the  formula,  then  A  is  an  atom  in  A;  we 
call  it  elementary  if  whenever  S:A  is  an  atom,  S  is  a  singleton.  Lemma  2  describes 
a  method  for  driving  signs  inward  for  a  restricted  class  of  MVL’s;  repeated  applica¬ 
tions  eventually  produces  a  A-atomic  formula.  The  lemma  is  immediate  since  the 
right  side  of  the  equation  amounts  to  an  enumeration  of  the  interpretations  that  map 
the  formula  on  the  left  side  into  S. 


Lemma  2.  Suppose  that  the  truth  domain  A  of  an  MVL  A  is  finite.  Let  ©  be 
a  connective  in  A  of  arity  n,  let  9i,  92,  ■  ■  ■  ,  be  formulas  in  A,  and  let  S  be  a 
sign.  Then  if  L  is  any  A-consistcni  interpretation. 


L(S:0(^,.5'2.  ,9„)) 


V  (  A  (15,):^, )  ))  .  □ 

,^>  P  B  ‘  (S)  1=^ 


278 


Observe  that  the  end  product  of  repeated  applications  of  the  lemma  will  be  elemen¬ 
tary;  this  may  be  undesirable  because  this  representation  may  be  unnecessarily  large. 
The  Reduction  Lemma  in  Section  3  may  yield  a  more  efficient  representation,  and 
knowledge  of  the  connective  ©  (for  a  particular  logic)  may  help  to  drive  the  sign 
inward  in  a  more  efiicient  manner.  When  A  is  infinite,  unless  some  equivalent  of 
Lemma  2  is  available,  attention  must  be  restricted  to  A-atomic  formulas.  These 
questions  are  discussed  in  greater  detail  in  Section  3. 

Remark;  There  is  a  natural  one-to-one  correspondence  between  interpretations  over 
A  and  A-co.isistent  interpretations  over  Aj  as  follows;  Given  an  interpretation  1  over 
A,  define  the  interpretation  Ij  over  A^  by  ls(5:  A)  =  true  ilT  6  =  1(A).  Since  connec¬ 
tives  in  A  are  functions,  uniquely  extends  to  an  interpretation  over  all  atoms  of  A^. 
Conversely,  by  part  iv  of  Lemma  I,  the  value  of  a  A-consisicni  interpretation  on  ele¬ 
mentary  atoms  determines  a  unique  interpretation  over  A. 

2.3.  First  Order  Considerations 

It  is  straightforward  to  introduce  variable,  constant,  and  function  symbols  into  A 
with  their  usual  meaning.  We  assume  a  non-empty  domain  of  discourse  £.  There 
appears  to  be  no  natural  way  to  inuoduce  quantifiers  into  A  while  preserving  full 
generality.  Consider,  for  example,  the  expression  Vx  P(x)  in  predicate  calculus.  It 
may  be  viewed  as  implicitly  containing  a  sign,  i.e.,  the  quantified  expression  is  true 
only  if  P  maps  all  domain  elements  into  (true).  The  implicit  sign  in  essence  desig¬ 
nates  a  subset  of  truth  values  that  must  contain  the  range  of  P(x)  in  order  that  the 
quantified  expression  denote  true.  We  can  generalize  this  notion  by  introducing  the 
quantifiers  V*  and  3^:  V'’x  P(x)  denotes  bue  only  if  P  maps  all  domain  elements 
into  S,  and  3^x  P(x)  denotes  true  only  if  P  maps  some  domain  element  into  S.  But 
these  quantifiers  denote  true  or  false,  not  arbitrary  elements  of  A,  and  hence  fall 
more  naturally  within  A^. 

We  may  instead  view  the  predicate  calculus  expression  Vx  P(x)  as  denoting 
the  greatest  lower  bound  of  the  range  of  P(x)  (where  the  boolean  domain  is  the  obvi¬ 
ous  two  element  lattice).  This  does  not  require  the  concept  of  sign,  and  docs  gen¬ 
eralize  to  A.  Thus  we  may  define  quantifiers  V  and  3  as  the  infimum  and  supremum, 
respectively,  of  the  denotations  of  their  arguments;  of  course,  this  is  possible  only  if 
A  is  a  lattice.  We  leave  a  more  thorough  exploration  of  these  issues  to  a  later  paper. 

If  we  do  restrict  quantifiers  to  Aj,  then  all  variable  occurrences  in  A  arc  free. 
Formulas  having  n  free  variables  then  denote  functions  from  .5"  to  A. 

It  is  easy  to  see  that  for  an  arbitrary  interpretation  over  any  domain  of 
discourse  .5  and  any  truth  domain  A,  a  corresponding  Herbrand  interpretation  can  be 
constructed  over  the  Herbrand  Universe  defined  in  the  usual  way.  Ground  atoms  are 
simply  partitioned  according  to  their  denotations  in  A.  As  a  result,  the  notion  of  a 
A-consistent  interpretation  is  still  meaningful  if  we  extend  Aj  to  include  the 
quantifiers  V  and  3.  In  classical  logic,  Herbrand’s  Theorem  can  be  proved  by  apply¬ 
ing  Kdnig’s  Lemma  to  semantic  trees,  and  it  goes  through  in  the  same  way  in  A^ 
with  respect  to  A-consistent  interpretations. 

Since  extending  A,  to  be  a  classical  first  order  logic  raises  no  truly  new  issues, 
we  restrict  discussion  to  the  ground  case  for  the  remainder  of  the  paper. 

3.  Signed  Inference 

In  this  section,  we  adapt  resolution  to  produce  an  inference  rule  for  Aj.  The  tech¬ 
niques  developed  in  [14]  and  in  [15]  for  dealing  with  signed  formulas  from  a  fairly 
specialized  class  of  logics  also  apply  to  A-atomic  formulas.  The  basic  idea  is  the 
next  lemma,  which  follows  immediately  from  part  iv  of  Lemma  1.  First,  we  say  that 
two  formulas  and  in  Aj  are  A-equivalent  if  1,(^5)  =  Isf^f',)  for  any  A- 
consistent  interpretation  I^;  we  writers  =a  ^  s- 


279 


Lemma  3  (The  Reduction  Lemma).  Lei  S^A  and  S2:A  be  A-aiomic  atoms  in 
A^;  then  Si :A  A  S2:A  (Sir»S2);A  and  Si:AvS2:A  (S,uS2):A.  □ 

3.1.  Notation 

The  formulas  in  Aj  that  we  are  interested  in  are  in  negation  normal  form  (NNP^; 
The  only  connectives  used  are  conjunction  and  disjunction,  (NNF  also  requires  that 
negations  must  reside  at  the  atomic  level.  This  is  irrelevant  here  since  negation  is 
absent  altogether  from  the  formulas  we  are  considering.)  Conjunctive  and  disjunc¬ 
tive  normal  forms  (CNF  and  DNF)  are  special  cases  of  NNF;  considerable  lime  and 
space  may  be  required  to  put  NNF  formulas  into  CNF  or  DNF. 

We  have  found  that  NNF  formulas  naturally  lend  themselves  to  a  two- 
dimensional  representation.  For  example,  in  Figure  1  below,  the  formula  on  the  left 
is  displayed  graphically  on  the  right: 

C 

A  V  D 

A 

((-iCaA)VD)  a  (  .AV(BaC))  =  A 

B 

A  V  A 

C 

Figure  1. 

Disjunctions  are  displayed  horizontally,  conjunctions  vertically.  A  foimula  so 

represented  is  called  a  semantic  graph',  for  a  detailed  exposition,  see  [II], 

Since  conjunction  (and  disjunction)  are  commutative  and  associative,  when 
viewed  as  a  graph,  their  arguments  are  called  fundamental  subgraphs.  The  funda- 

c 

mental  subgraphs  of  the  upper  disjunction  in  the  graph  above  are  a  and  the  literal  D. 

A 

The  graph  abqye  contains  four  c-paths  (maximal  conjunctions  of  literal 
occurrences):  (C.  A,  A),  (C,A,B,C),  {D,  A),  (D,B,C).  We  say  that  two  literal 
occurrences  are  c-connected  if  they  are  both  in  some  c-path.  With  respect  to  arbi¬ 
trary  interpretations,  a  semantic  graph  is  unsalisfiablc  if  and  only  if  every  c-palh  is 
unsatisfiable,  and  a  c-palh  is  unsatisfiable  if  and  only  if  it  contains  a  link.  The  for¬ 
mulas  we  study  in  Aj  have  no  links  in  the  classical  sense,  but,  as  we  shall  sec,  the 
notion  of  link  can  be  redefined  with  respect  to  signs  in  a  meaningful  way. 


3.2.  Signed  resolution 

In  this  section  we  generalize  the  resolution  inference  rule  for  classical  logic.  Com¬ 
bined  with  the  Reduction  Lemma,  this  generalization  will  lead  naturally  to  an  infer¬ 
ence  rule  for  A, . 

Let  ^  be  a  conjunction  of  disjunctions  in  a  classical  logic.  We  use  the  term 
clause  for  the  disjunctions  even  though  we  do  not  assume  them  to  be  disjunctions  of 

( ^  ( fHo 

literals.  Let  two  of  the  clauses  be  C  -  Qc  V  Jj  and  D  =  Qy  v  V  Kj  . 

[j=i  [j=i 

Then  we  may  resolve  clauses  C  and  D  and  infer 


Qc 

ITlc 

A  V 

VJ 

Qd 

j=i  ■ 

V 

j=i 


280 


Soundness  follows  from  noting  that  if  C  A  Z>  is  true,  but  none  of  the  Jj ’s  and  none 
of  the  Kj ’s  are,  then  Qc  and  must  both  be  true.  Observe  that  when  C  and  D  are 
clauses  of  literals  and  Qc  arc  Qd  arc  complimentary,  this  rule  reduces  to  standard 
binary  resolution. 

Consider  now  a  A-aiomic  formula  3  in  A,  that  is  in  conjunctive  normal  form 
(CNF).  As  such,  the  clauses  of  3  arc  sets  of  literals.  Let  C,,  l<j<r,  be  clauses  in  3 
that  contain,  respectively,  A-atomic  atoms  {S,:A).  Thus  we  may  write 
Cj  = /fiUfS^A).  Then  tlic  resolvent  R  of  the  CS  is  defined  to  be  the  clause 

j  f  j  j  r  ■’ 

(  LJ  ((  O  Sj):A);  this  definition  is  the  one  given  above,  generalized  to  n 

j=l  j=l 

clauses  and  specialized  to  A-consislent  interpretations  via  the  Reduction  Lemma. 

We  must  augment  this  definition  with  the  following  obvious  simplification 
rules  that  also  stem  from,  the  Reduction  Lemma.  First,  if  O  Sj  is  empty,  then 

r  J=' 

( O  Sj):A  is  false  and  may  simply  be  deleted  from  R.  Secondly,  whenever  SjiB 
i=i 

and  $2:3  arc  in  R,  .ve  replace  them  by  (Si  uS2):B;  il  (Si  caS2)  =  A,  then  R  is  a  tau¬ 
tology  and  may  be  deleted  from  3. 

The  classical  notion  of  subsumption  also  generalizes  to  clause  C  subsumes 
clause  D  if,  for  every  literal  S:A  e  C,  there  is  a  literal  S':A  e  D  such  that  S  c  S'. 

3.3.  An  example 

Consider  the  logic  A  based  on  the  three-valued  domain  {0,u,l  J,  ordered  so  that  U<  1 
and  u<  1,  but  0  and  u  arc  not  comparable.  There  are  two  unary  connectives,  and 
-,  and  two  binary  connectives  ®  and  their  truth  tables  arc  shown  below. 


— 1 

~ 

® 

0 

U 

1 

© 

0 

u 

] 

0 

1 

1 

0 

0 

0 

0 

0 

0 

u 

1 

u 

1 

1 

u 

0 

u 

u 

u 

u 

] 

1 

1 

0 

u 

1 

0 

u 

1 

1 

1 

1 

1 

Observe  that  although  the  two  binary  connectives  arc  "and-like "  and  "or-like  ”. 
A  is  not  a  lattice. 

Consider  the  formula 


3  =  (-A  ©  B)  ®  ^  (A  @  B) 

We  may  ask  if  3  is  "satisfiable,"  i.e.,  if  it  can  possibly  evaluate  to  1,  by  considering 
the  formula  (IjiS*  in  A,.  If  {1}:^  does  indeed  have  A-consistcni  models,  we  may 
be  interested  in  knowing  them;  this  situation  favors  the  use  of  analytic  tableaux  or 
path  dissolution  in  the  analysis,  since  these  methods  can  effectively  determine  satis¬ 
fying  interpretations.  Alternatively,  if  we  suspect  that^  always  evaluates  to  1,  then 
a  refutation  procedure  such  as  resolution  could  be  employed  on  a  A-cquivalcnl  of 
{0,u):5. 

For  our  example,  we  use  the  signed  formula  {\}:3. 

{ 1 )  :  5*  =  { 1 }  ;  (~A  ©  B)  ®  ^  (A  ©  B) 

Since  ®“'(1)  =  {<1,1>),  Lemma  2  gives  u.s: 

(1)  :(~A©B) 

A 

{!)  :  -,(A  ©  B) 


I 


281 


Direct  application  of  Lemma  2  on  both  conjuncts  ot  the  above  formula  leads  to  the 
(not  yet  A-atomic)  formula  of  Figure  2  below: 


{1):~A 

{l):-A 

{11;~A 

lulr-A 

{0};~A  (u}:~A 

A  V 

A  V 

A 

V 

A  V 

A  V  A 

{1}:B 

lul-.B 

{0);B 

{1):B 

{11:B  {u):B 

A 

(01:(A 

>  0  B) 

V 

lu):(A© 

B) 

Figure  2. 

The  careful  reader  will  have  noticed  that  the  formula  above  can  be  simplified. 
Factoring  the  leftmost  three  disjuncts  in  the  upper  half  of  the  graph  on  the  atom 
[{1};~A]  results  in  the  formula  [{ 1):-A  A  ({0):B  V  {u):B  v  ( 1):B) ).  This  is  just 
[{1}:-A  A  A:B]  by  the  Reduction  Lemma,  and,  by  part  ii  of  Lemma  1,  it  reduces 
further  to  {1};~A.  Similarly,  the  three  disjuncts  containing  {1):B  (using  a  copy  of 
the  leftmost  disjunct)  reduce  to  { 1 )  :B. 

From  the  discussion  above,  and  from  the  application  of  Lemma  2  to  the  lower 
half  of  Figure  2,  the  formula  in  Figure  3  results. 


{ulr-A 

A  V  (1);B 

V 

A 

{u):B 

A 

(0);A 

{0):A  {u);A 

A 

V 

A  V  A 

(01:B 

(u):B  {0):B 

Figure  3. 


By  applying  Lemmas  2  and  3  to  the  upper  part  of  Figure  3,  then  factoring  (on 
{0):A,  {u):A,  and  finally  on  {0,u);B)  and  applying  Lemma  3  to  the  lower  part.  Fig¬ 
ure  4  is  produced. 


{0,ul:A  V  {11:B 

V 

{1):A 

A 

{01:A 

A 

lu}:B 

lu):A 

A 

V 

A 

(0,u):B 

{()}:B 

Figure  4, 

,  The  formula  in  Figure  4  is  a  conjunction  but  is  not  in  CNF.  We  may  convert 

'  the  upper  conjunct  into  the  two  clauses  { {0,u;:A,  { 1  ):B,  { 1 1:A)  and 

)  ({0,u}:A,  {1}:B,  {u):B),  which  are  A-equivalent  to  true  and  {{0,u):A,  {u,l):B), 

respectively.  Similarly,  {{0,u):B}  is  one  of  the  clauses  obtainable  from  the  lower 
j  conjunct.  As  an  example  of  a  resolution  step,  resolving  { {0,u):A,  {u,l  );B)  and 

{(0,u);B}  produces  Note  *b^l  in  tb's 'simple  example,  iiowevcr, 

1 

I 


A 


282 


the  clause  {{0,u):A),  also  obtainable  from  the  lower  conjunct,  subsumes 
{{0,u]:A,  {u,l):B);  in  elTcct,  the  upper  conjunct  itself  is  subsumed  and  could  be 
dropped. 

The  example  illustrates  the  explosive  growth  that  can  occur  with  applications 
of  Lemma  2.  Lemmas  1  and  3  can  used  to  reduce  the  size  of  the  formula,  but  the 
process  of  applying  Lemma  2  and  then  condensing  the  result  is  likely  to  be  expen¬ 
sive.  For  a  given  connective  in  a  given  logic  A,  the  simplifications  available  from 
the  Lemmas  may  be  pre-computable,  producing  rules  that  directly  yield  the  more 
condensed  formulas.  Consider,  for  example,  the  truth  table  for  @.  It  is  easy  to  .see 
that  the  following  rule  is  valid: 

{1):(A©B)  (1):AV  {1}:B  v((u):A  A  {u):B). 

Even  this  rule  can  lead  to  exponential  growth,  but  this  is  analogous  to  the  cost  of 
putting  a  classical  formula  into  CNF. 

More  generally,  the  expense  of  driving  signs  inward  depends  on  the  efficiency 
of  such  rules.  Lemma  1  is  a  very  inefficient  means  of  accomplishing  this  task;  the 
above  rule  is  more  reasonable.  With  some  connectives  in  .some  logics  it  may  not  be 
possible  to  develop  efficient  rules,  but  such  cases  are  likely  to  reflect  the  inherent 
complexity  of  the  base  logic. 

3.4,  Completeness 

The  resolution  rule  presented  above  was  introduced  in  [15]  for  a  class  of  multiplc- 
valu^  logics.  There,  the  truth  value  domains  were  assumed  to  obey  rather  severe 
restrictions,  and  all  queries  had  the  form;  Can  a  formula  evaluate  to  1  (the  maximal 
element  of  a  truth  value  lattice)?  Refutation  completeness  was  proven  using  a 
modified  semantic  tree  argument  in  which  interpretations  over  the  MVL  were 
represented. 

In  this  paper,  we  are  dealing  with  a  much  wider  class  of  MVL’s  and  arc 
allowing  more  general  queries  about  the  formulas  in  those  MVL’s.  The  resolution 
rule  and  the  signed  formulas  on  which  it  operates,  however,  arc  defined  in  a  classical 
logic.  In  this  setting,  completeness  is  not  a  statement  about  what  formulas  in  A  can 
be  proved  inconsistent;  rather,  completeness  means  that  we  can  answer  certain 
queries  about  formulas  in  ^  by  deriving  the  empty  clau.se  in  A^.  Nevertheless,  the 
semantic  tree  construction  in  [15J  can  be  adapted  to  produce  a  completeness  argu¬ 
ment  for  Aj. 

Theorem  1.  Let  5*  be  a  formula  in  A,  and  let  S  be  a  subset  of  A  such  tliat  no 
interpretation  maps  9  into  S.  Then  there  is  a  derivation  of  the  empty  clause  from 
any  A-atomic  equivalent  of  S:.?  using  signed  resolution.  □ 

4.  Conclusions 

Signed  resolution  appears  to  provide  a  unifying  framework  for  most  adaptations  of 
resolution  to  MVL’s.  For  example,  the  annotated  logics  (also  called  paracon.sisieni 
logics)  inU'oduced  by  da  Costa,  Henschen,  Lu,  and  Subrahmanian  [2|.  and  by  Kifcr 
and  Subrahmanian  [9]  employ  two  inference  rules,  p-resolution  and  reduction 
(sometimes  referred  to  as  mega-resolution  and  cloning).  Both  rules  arc  special  cases 
of  signed  resolution.  A  question  worthy  of  investigation  is  whether  signed  rc.solu- 
tion  yields  some  form  of  linear  resolution  for  these  logics. 

The  technique  developed  for  applying  resolution  to  MVL’s  via  signed  formu¬ 
las  works  with  most  classical  inference  techniques.  For  example,  path  resolution,  a 
generalization  of  resolution  th?.t  permits  resolving  on  sets  of  links  in  formulas  in 
NNF,  can  easily  be  adapted  to  X,.  Hahnle’s  work  [7,8]  with  tableaux  does  this  for 
the  method  of  analytic  tableaux,  and  path  dissolution,  of  which  the  tableau  method  is 


283 


one  special  case,  was  similarly  adapted  in  [15],  Space  docs  not  permit  a  detailed 
description  in  this  paper,  but  the  a^piations  are  quite  straightforward  because  the 
logic  of  signed  formulas  is  a  classical  logic. 

Our  approach  in  this  paper  is  very  general  in  that  there  arc  no  requirements  on 
the  structure  of  the  domain  of  truth  vjdues;  logical  connectives  arc  similarly  unres¬ 
tricted.  However,  when  A  is  infinite,  we  must  begin  with  A-atomic  formulas;  for 
finite  A,  arbitrary  formulas  in  Aj  can  in  principle  be  handled,  but  this  can  lead  to 
explosive  growth  when  Lemma  2  is  employed  directly.  It  would  be  useful  to  inves¬ 
tigate  the  relationship  between  conditions  placed  on  A  and  on  the  connectives,  and 
the  resulting  rules  for  deriving  A-atomic  formulas.  Hahnlc  [8]  has  investigated  this 
issue  from  the  standpoint  of  analytic  tableau  procedures.  It  would  also  be  useful  to 
discover  a  class  of  logics  for  which  A  is  infinite  and  for  which  some  technique 
(analogous  to  Lemma  2)  for  driving  signs  inward  can  be  developed. 

Acknowledgements 

Discussions  with  James  Lu  have  been  helpful  in  clarifying  our  understanding 
of  annotated  logics.  We  are  grateful  to  him  for  pointing  out  the  relationship  between 
signed  resolution  and  resolution  for  annotated  logics.  We  have  also  benefited  from 
discussions  with  Reiner  Hahnle  and  with  V.S.  Su&ahmanian. 

References 

1.  Camiclli,  W.A.  Systematization  of  finite  many-valued  logics  through  the 
method  of  tableaux.  Journal  of  Symbolic  Logic,  52,2  (June  1987),  473-493. 

2.  da  Costa,  N.C.A.,  Henschen,  L.J.,  Lu,  J.J.,  and  Subrahmanian,  V.S. 
Automatic  Theorem  Proving  in  Paraconsistent  Logics:  theory  and  Implementa¬ 
tion.  Proceedings  of  the  10-th  International  Conference  on  Automated  Deduc¬ 
tion,  Kaiserslautern,  W.  Germany,  July  1990.  In  Lecture  Notes  in  Artificial 
Intelligence,  Springer  Verlag,  Vol.  449,  72-86. 

3.  Doherty,  P.  Preliminary  repiort:  NM3  -  A  thrcc-valucd  non-monotonic  formal¬ 
ism.  Proceedings  of  the  Fifth  International  Symposium  on  Methodologies  for 
Intelligent  Systems,  Knoxville,  Tennessee  October  25-27,  1990.  In  Metho¬ 
dologies  for  Intelligent  Systems,  5  (Ras,  Z.,  Zemankova,  M.,  and  Emrich,  M. 
eds.)  North-Holland,  1990,  498-505. 

4.  Fitting,  M.  Resolution  for  Intuitionislic  Logic.  In  Methodologies  for  Intelli¬ 
gent  Systems,  (Ras,  Z.  and  Zemankova,  M.,  eds.)  North-Holland,  1987,  400- 
407. 

5.  Fitting,  M.  First-order  modal  tableaux.  J.  Automated  Reasoning  4,  (1988) 
191-213. 

6.  Gabbay,  D.  LDS  -  Labeled  Deductive  Systems.  Oxford  University  Press,  to 
appear. 

7.  Hahnle,  R.  Towards  an  efficient  tableau  proof  procedure  for  multiple-valued 
logics.  Proceedings  of  the  Workshop  on  Computer  Science  Logic,  Heidelberg, 
1990.  In  Lecture  Notes  in  Computer  Science,  Springer  Verlag,  Vol  533,  248- 
260. 

8.  Hahnle,  R.  Uniform  notation  tableau  rules  for  multiple-valued  logics. 
Proceedings  of  the  International  Symposium  on  Multiple-Valued  Logic,  Vic¬ 
toria,  BC.  May  26-29,  1991, 238-245. 


284 


9.  Kifer,  M.  and  Subrahmanian,  V.S.  On  the  expressive  power  of  annotated 
logic  programs.  Proceedings  of  the  1989  North  American  conference  on 
Logic  Programming,  Cleveland,  OH,  1069-1089. 

10.  Morgan,  C.  Resolution  for  many-valued  logics.  Logique  et  Analyse  74-76 
(1976),  311-339. 

11.  Murray,  N.V.,  and  Rosenthal,  E.  Inference  with  Path  Resolution  and  Semantic 
Graphs.  JACM  34,2  (April  1987),  225-254. 

12.  Murray,  N.V.,  and  Rosenthal,  E.  Dissolution:  Making  paths  vanish.  To 
appear,  JACM. 

13.  Murray,  N.V.,  and  Rosenthal,  E.  Employing  path  dissolution  to  shorten 
tableau  proofs.  Proceedings  of  the  1989  International  Symposium  on  Sym¬ 
bolic  arid  Algebraic  Computation,  Portland,  Oregon  July  17-19,  1989,  373- 
381. 

14.  Murray,  N.V.,  and  Rosenthal,  E.  Improving  tableaux  proofs  in  multiple¬ 
valued  logic.  Proceedings  of  the  21“  International  Symposium  on  Multiple- 
Valued  Logic,  Victoria,  B.C.,  Canada,  May  26-29,  1991,  230-237. 

15.  Murray,  N.V.,  and  Rosenthal,  E.  Resolution  and  path  dissolution  in  multiple¬ 
valued  logics.  Proceedings  of  the  International  Symposium  on  Methodologies 
for  Intelligent  Systems,  Ci\aT\olUi,l^C,  October  16-19,  1991.  In  Lecture  Notes 
in  Artificial  Intelligence,  Springer- Verlag,  Vol.  542,  570-579. 

16.  O’heam,  P.,  and  Stachniak,  Z.  Note  on  theorem  proving  strategics  for  resolu¬ 
tion  counterparts  of  non-classical  logics.  Proceedings  of  the  1989  Interna¬ 
tional  Symposium  on  Symbolic  and  Algebraic  Computation,  Portland,  Oregon 
July  17-19,  1989,  364-372. 

17.  Sandewall,  E.  The  semantics  of  non-monotonic  eniailment  defined  using  par¬ 
tial  interpretations.  In  M.  Ginsburg,  M.  Reinfrank,  and  E.  Sandewall,  ed., 
Non-Monotonic  Reasoning,  2"^  International  Workshop.  Springer,  1988. 

18.  Schwind.  A  tableau  based  theorem  prover  for  a  decidable  subset  of  default 
logic.  Proceedings  of  the  70'*  International  Conference  on  Automated  Deduc¬ 
tion,  Kaiserslautern,  W.  Germany,  July  24-27,  1990.  In  Lecture  Notes  in 
Artificial  Intelligence,  Springer-Verlag,  Vol.  449,  528-542. 

19.  Smullyan,  R.M.  First-Order  Logic.  Springer  Verlag,  1968. 

20.  Suchon,  W.  La  methode  de  Smuljyan  de  construire  Ic  calcul  n-valent  de 
Lukasiewicz  avec  implication  et  negation.  Reports  on  Mathematical  Logic, 
Universities  of  Cracow  and  Katowice,  1974,  2,  37-42. 

21.  Surma,  SJ.  An  algorithm  for  axiomatizing  every  finite  logic.  In  Computer 
Science  and  Multiple-Valued  Logics,  David  C.  Rine,  Ed.,  North-Holland, 
Amsterdam,  1984,  143-149. 

22.  Stachniak,  Z.  Note  on  resolution  circuits.  Proceedings  of  the  International 
Symposium  on  Methodologies  for  Intelligent  Systems,  Charlotte,  NC,  October 
16-19,  1991.  In  Lecture  Notes  in  Artificial  Intelligence,  Springer-Verlag,  Vol. 
542,  620-629. 


New  Design  Concepts  for  the  FLINS-Fuzzy  Lingual 
System:  Text-based  and  Fuzzy-centered 
Architectures 

Shun'ichi  Tano,  Wataru  Okamoto,  and  Toshiharu  Iwatani 
Laboratory  for  International  Fuzzy  Engineering  Research 
Siber  Hegner  Building  3F,  89-1  Yamashita-cho, 

Naka-ku,  Yokohama,  231,  JAPAN 

Abstract.  A  fuzzy  natural  language  communication  system  called  the 
Fuzzy  Lingual  System  (FLINS)  is  currently  under  development  at  the 
Laboratory  for  International  Fuzzy  Engineering  Research  (LIFE).  The  final 
goal  of  the  FLINS  project  is  to  create  a  lingual  computer,  that  is,  a  domain- 
independent  teach,  question,  and  answer  (TQA)  system.  In  this  paper,  we 
propose  two  new  design  concepts,  text-based  architecture  and  fuzzy-centered 
architecture,  to  realize  our  goal.  Two  experimental  systems  were  built  to 
determine  the  feasibility  of  those  approaches  before  the  development  of 
FLINS.  One  is  a  text-based  natural  language  understanding  system,  called  the 
AB-System.  The  other  is  a  fuzzy  expert  system  called  FOREX,  which 
predicts  exchange  rate  trends  according  to  fuzzy  rules  and  fuzzy  data. 

1.  Introduction 

Even  though  computer  systems  have  great  computing  power  and  can  contain  a  great 
amount  of  information,  current  user  interfaces  are  so  rigid  and  unfriendly  that  only 
specialists  can  make  full  use  of  these  systems.  We  at  the  Laboratory  for  International 
Fuzzy  Engineering  Research  (LIFE)  are  developing  a  natural  language  communication 
system  called  FLINS,  which  is  short  for  Fuzzy  Lingual  System  to  overcome  this 
barrier.  Our  ultimate  goal  is  to  create  a  lingual  computer  that  functions  as  a  domain- 
indejjendent  teach,  question,  and  answer  (TQA)  system.  It  is  an  old  problem,  but  still 
alive. 

FLINS  is  based  on  two  new  design  concepts,  text-based  architecture  and  fuzzy-centered 
architecture.  The  former  was  directly  derived  from  the  problems  associated  with 
conventional  natural  language  understanding  systems,  and  can  be  viewed  as  an  AI- 
related  problem.  The  latter  concept  is  a  little  different  and  may  be  quite  new  to  people 
in  the  A1  community. 

2.  Motivation  and  Basic  Approach 

2.1  Motivation 

Our  final  goal  is  a  lingual  computer  that  facilitates  the  interaction  between  users  and 
computer  systems.  The  fundamental  features  are: 

Communication  by  means  of  natural  language 

A  user  and  system  should  be  able  to  communicate  by  means  of  natural  language. 
Teaching  by  means  of  natural  language 

A  user  should  be  able  to  teach  the  system  new  knowledge  by  means  of  natural 
language.  The  new  knowledge  includes  meta-knowledge  for  control  of  the  system 
itself,  rule-type  knowledge,  and  simple  factual  information. 

Learning  by  means  of  natural  language 

The  system  should  be  able  to  store  all  interaction  between  a  user  and  the  system  in  a 
knowledge  base  (which  we  call  a  text  base  because  all  interactions  use  text).  The 
system  should  be  able  to  manage  unknown  situations  by  using  this  text  as  case  data. 


286 


2.2  Basic  Approach 

Our  basic  approach  is  to  combine  the  three  key  technologies,  knowledge  engineering, 
natural  language  processing  and  fuzzy  engineering.  Here's  a  brief  analysis  of  the 
successes  and  failures  of  the  technologies. 

(1)  Knowledge  Engineering 

The  key  idea  in  knowledge  engineering  (KE)  is  "knowledge  is  power",  and  a  new 
architecture  for  an  intelligent  system  based  on  this  idea  was  proposed.  While  the 
divided  knowledge  base  and  inference  mechanism  architecture  has  proven  to  be  useful, 
the  difficulty  of  knowledge  representation  prevents  it  from  being  ust^  extensively. 

(2)  Natural  Language  Processing 

SHRDLU  is  the  most  famous  natural  language  processing  system.  It  worked  quite 
well  in  a  block  world.  But  the  system  was  not  generalized. 

A  number  of  machine  translation  systems  are  currently  in  use[7].  Although  their 
translation  capability  is  limited,  they  can  transform  most  text  into  a  case-based 
structure[3],  and  they  accept  an  almost  unlimited  range  of  text.  Moreover,  these 
machine  translation  systems  use  very  similar  sets  of  "cases".  This  demonstrates  that 
methodologies  have  been  established  almost  up  to  the  "case  structure”  level. 

(3)  Fuzzy  Engineering 

Fuzzy  theory  was  proposed  by  L.A.  2^deh  in  1965[12].  At  first  this  theory  was 
ignored  but  as  the  number  of  successful  practical  applications  increased,  especially  in 
the  area  of  a  fuzzy  control,  this  situation  changed  rapidly.  It  was  originally  proposed  to 
deal  with  the  fuzziness  found  in  such  natural  language.  So  it  provides  us  with  a  good 
framework  for  coping  with  fuzziness  in  natural  language. 

3.  Problems  with  Conventional  Systems  [eg  ,  l,  5,  9). 

(1)  Closed  Nature 

Actually,  it  is  impossible  to  provide  natural  language  understanding  systems  with 
every  piece  of  knowledge,  such  as  knowledge  about  the  target  field  and  meia- 
knowl^ge,  as  well  as  word  definitions  and  sentence  suucture  knowledge,  as  inherent 
knowledge.  Therefore,  it  is  very  important  that  users  can  change  what  the  system 
already  knows  and  can  teach  the  system  what  it  does  not  know  through  natural 
language. 

Such  extensibility  is  a  key  attribute  for  a  true  natural  language  understanding  system. 
However,  most  conventional  systems  do  not  allow  users  to  use  natural  language  to 
teach  and  change  knowledge.  Although  a  few  systems  that  can  accept  knowledge  in 
natural  language  exist,  they  can  only  accept  simple  knowledge  and  their  treatment  of 
the  knowledge  is  ad  hoc. 

(2)  Implicit  and  Inherent  Knowledge  Represented  in  a  Special  Form 
Most  systems  have  implicit  and  inherent  knowledge  represented  in  a  special  form, 
such  as  program  source  code  (e.g.,  LISP  code),  frames  and  senpts.  In  some  cases, 
special  tricks  are  hidden  in  this  manner. 

The  big  problem  with  implicit  knowledge  is  that  the  system  looses  transparency. 
Thus  most  natural  language  understanding  systems  work  only  for  specific  fields  and 
are  never  applied  in  other  fields.  Furthermore  the  more  intelligent  the  system,  the 
more  likely  important  knowledge  is  represented  in  a  specific  form  as  implicit 
knowledge  (tricks) .  The  problem  with  representing  knowledge  in  special  form  is  that 
it  is  almost  impossible  for  users  who  are  not  familiar  with  the  special  form  to  make 
changes  in  the  knowledge  or  add  new  knowledge. 

Note  that  we  do  not  deny  the  effectiveness  of  representations  such  as  rules  and  frames. 
We  only  insist  that  a  user  should  be  able  to  access  the  important  knowledge,  so  it  has 


287 


lo  be  accessible  by  means  of  natural  languat,e  instead  of  the  special  form.  It  is  no 
problem  that  it  is  represented  in  a  special  form  as  long  as  it  can  be  accessed  by  means 
of  natural  language. 

(3)  Using  Deep  Structure  to  Represent  Meanings 

Most  natural  language  understanding  systems  adopt  knowledge  representation  that  is 
based  on  deep  structures,  such  as  CD  [8]  or  logical  forms[ll.  In  this  representation,  the 
meaning  of  the  text  is  represented  using  deep  semantic  primitives.  For  example,  "to 
give"  is  transformed  into  "to  transfer  the  ownership  to  others"  Of  course,  this 
representation  is  sound  and  is  suitable  for  manipulation  of  the  meaning,  because  it  is  a 
sort  of  canonical  form  of  the  meaning.  However,  there  is  one  inherent  problem  with 
this  type  of  representation.  To  transform  text  into  a  deep  structure,  the  system  must 
completely  understand  the  meaning  of  the  text,  which  is  very  difficult  even  for 
humans.  Moreover,  defining  such  complete  and  domain- independent  sets  of  deep 
semantic  primitives  and  the  rules  for  transformation  from  text  to  deep  structures  is 
actually  impossible. 

We  think  this  is  the  reason  most  existing  natural  language  understanding  systems  can 
never  be  applied  outside  of  a  fixed  field  or  the  limited  situations  they  were  originally 
designed  to  handle.  From  the  theoretical  point  of  view,  the  concept  of  deep  structure 
is  attractive,  but  it  is  not  powerful  enough  to  devise  a  natural  language 
understanding  system  that  is  applicable  to  various  real  fields. 

(4)  Ignorance  of  fuzziness 

One  essential  characteristic  of  natural  language  is  fuzziness.  This  fuzziness  ranges 
from  the  ambiguity  caused  by  multiple  parse  trees  for  one  sentence  or  multiple 
meanings  for  one  word  to  the  fuzziness  of  the  meaning  itself.  Although  the  former 
type  of  fuzziness  has  been  thoroughly  studied  for  disambiguation  algorithms,  the  latter 
type  has  not.  It  has  been  ignored  in  studies  of  natural  language  processing  so  far. 

4.  New  Design  Concepts 

4,1  Text-Based  Architecture 

The  basic  concept  of  the  text-based  architecture  is  that  all  knowledge  should  be 
presented  as  text.  "All  knowledge"  includes  linguistic  knowledge  (i.e.,  word 
definitions  and  knowledge  of  sentence  structure),  knowledge  of  the  target  field  (i.e., 
domain-dependent  rules  and  data),  and  meta-knowledge  (i.e.,  how  to  use  the  knowledge 
itself  and  how  to  control  the  system  with  it).  We  refer  to  this  collection  of  knowledge 
as  a  text  base  rather  than  as  a  knowledge  '^ase  to  expi  .ss  clearly  that  all  knowledge  is 
represented  as  text 

The  system  uses  only  a  basic  pattern  matcher  that  is  controlled  by  a  schema  described 
by  text  to  process  all  knowl^ge.  This  feature  allows  the  user  to  use  text  to  teach 
even  meta-knowledge  because  the  process  of  problem  solving  is  controlled  by  text. 

The  following  four  subconcepts  make  a  text-based  architecture  feasible.  They  are 
indispensable  to  implementation  of  the  text-based  architecture.  In  other  words,  the 
text-based  architecture  mentioned  above  is  a  principal  and  the  following  four 
subconcepts  are  methodologies  for  implementing  the  principal. 

(1)  Combination  of  Case-based  Structure  and  the  Text  Base 
Current  machine  translation  systems  have  limited  translation  capability,  but  they  can 
transform  most  text  into  a  case-b'' sed  structure  with  almost  no  constraints  on  the  input 
text.  Moreover,  these  machine  translation  systems  use  very  similar  sets  of  cases,  so  it 
is  quite  natural  to  use  a  case-based  structure  for  general  knowledge  representation. 

The  problem  in  doing  so,  however,  is  that  the  case-based  structure  is  too  sensitive  to 
vocabulary  and  sentence  snucture,  because  words  in  the  input  text  appear  in  the  case- 


288 


based  structure  directly  and  the  sentence  structure  sometimes  affects  the  result. 
Therefore  even  if  the  meaning  of  two  texts  is  the  same,  their  case-based  representations 
may  differ. 

However,  if  the  system  can  accept  linguistic  knowledge,  the  problem  of  case-based 
representation  can  be  solved.  This  is  because  if  the  system  accepts  liP'juistic 
knowledge  as  well  as  domain-dependent  knowledge,  it  can  infer  the  meaning  of  a  text 
by  using  the  linguistic  knowledge  in  exactly  the  same  way  that  domain-dependent 
knowledge  is  used  for  problem  solving.  The  more  linguistic  knowledge  a  user  teaches 
the  system,  the  more  deeply  the  system  can  understand  a  text. 

The  combination  of  case-based  structure  and 
linguistic  knowledge  is  a  nice  alternative  to 
deep  structure.  In  the  case  of  the  deep 
semantic  structure,  text  is  transformed  into 
a  certain  structure  at  the  fixed  depth  of 
meaning  and  is  used  in  problem  solving  as 
indicated  by  the  dotted  line  shown  in  Fig. 

1 .  For  combined  case-based  sb  ucture  and 
text  base,  the  depth  of  meaning  is  not  fixed  meaning 

at  all,  rather  it  changes  according  to  the  Fig.  1  Image  of  Text- based  Inference 

progress  of  problem  solving. 

(2)  Semantic  and  Procedural  Primitives 

All  a  text-based  system  knows  initially  are  semantic  and  procedural  primitives.  Note 
that  our  semantic  primitives  are  completely  different  from  those  of  CD  theory [8]. 
Semantic  primitives  are  words  that  have  a  special  role  in  inference.  For  example, 
"imply",  "if  and  "when"  are  semantic  primitives.  They  are  evaluated  through 
inference  when  paraphrasing  a  text  or  making  a  new  goal  event. 

Procedural  primitives  are  connected  subroutines  and  execute  primitive  functions,  such 
as  inference  pattern  matching  and  database  calls.  Words  like  "system-call"  and 
"event-match"  are  examples  of  procedural  primitives.  If  these  procedural  primitives 
are  evaluated  under  certain  conditions,  the  [H^ocedure  is  called  automatically.  Inference 
can  be  controlled  by  activating  a  procedural  primitive.  This  means  that  a  user  can 
use  procedural  primitives  to  specify  how  to  infer  or  how  to  answer.  This  feature 
enables  the  user  to  teach  even  meta-knowS^dge  b)  using  text.  Moreover,  it  enables 
the  system  to  explain  its  own  behavior  because  the  status  of  processing  is  also 
represented  by  case-based  structure  that  can  easily  be  transformed  into  natural 
language. 

The  inherent  knowledge  of  the  system  represented  in  special  form  consists  of  only 
these  two  types  of  primitives.  All  other  knowledge  acquired  through  interaction  with 
users  is  represented  as  text. 

(3)  Network  Inference 

All  knowledge  in  a  text-based  system  is  represented  in  a  case-based  structure,  which 
looks  like  an  extended  semantic  network.  The  basic  function  of  inference  in  network 
inference  is  modeled  as  an  interaction  among  the  networks.  Inference  patterns  include 
direct  matching,  paraphrase  matching,  and  deductive  matching. 

As  mentioned  above,  all  knowledge  is  represented  as  text.  If  texts  in  the  text  base  are 
not  combined  into  one  network  but  rather  exist  as  separate  networks,  the  infe'  jnce 
mechanism  has  to  combine  the  text  every  time  an  input  occurs,  which  coula  be  a 
serious  problem.  The  text  in  a  text  base  should  thus  be  combined  into  one  network. 
This  operation  is  equivalent  to  the  precomputation  of  inference  or  a  partial 
computation. 


proeress  of  oroblem  solving 


289 


(1)  Non-hierarchicai  Wondering  Inference 

The  text  basiC  can  be  seen  as  shown  in  Fig.  2.  The  primitive  layer  includes  words 
which  have  special  meaning  either  semantically  or  procedurally.  In  the  word  layer, 
linguistic  knowledge  is  stored.  For  example.  Fig.  2  shows  that  the  word  "beheaded"  is 
a  form  of  the  word  "executed".  It  is  possible  to  specify  the  constraints  on  th».,  relation. 
For  example,  "  to  execute  a  person  implies  to  kill  a  person"  specifies  that  the  object 
of  each  verb  should  be  a  person.  This  layer  can  be  seen  as  a  word  dictionary.  Domain- 
specific  knowledge  is  stored  in  the  knowledge  layer  and  meta-knowledge  is  stored  in 
the  meta-knowledge  layer. 


Semantic  Primitives 


Primidvel 

Layer 

Word 

Layer 

Knowledge! 
Layer 


Meta-knowledge 
Layer 


Procedural  Primitives 


i  if  ! ; 

i  i. 

system-call  event-matchert 

imply 

Ci  . 

ZS _ if  ^ 

ODjeci  T 

to  be  killed 

be  beheaded  be  executed  — ‘f — 

_<j>^ 

if 

execute^ 

unknown  v 

<vQbject 

tjm?to  .  _ 
LzOuis  16 

^4£EP.gS^ 

dispjaj^  l^tem-call  ^ 

T  r  is  askei 

when  . 

event-matcher 

Fig.  2  Illusion  of  Knowledge  Hierarchy 


However,  it  is  easy  to  see  that  the  categorization  is  meaningless  from  the  viewpoint  of 
knowledge  representation  because  there  is  no  difference  among  the  knowledge 
representations.  All  knowledge  is  represented  in  the  case-based  structure.  Let's 
consider  the  sentence,  "If  a  user  asks  a  question,  the  system  must  answer  the 
question".  The  sentence  should  be  treated  as  meta-knowledge  for  reacting  to  the  user's 
query  and  should  be  treated  as  simple  factual  knowledge  for  answering  the  question, 
"What  do  you  do  if  a  user  asks  a  question?".  Therefore,  there  are  no  differences  among 
the  various  types  of  knowledge.  This  characteristic  causes  a  problem  in  inference 
control. 

In  the  conventional  KE  tool,  meta-knowledge  is  represented  in  a  special  form  and  is 
processed  by  a  supervisor  inference  unit.  This  separation  of  ordinal  knowledge  and 
meta-knowledge  enables  the  system  to  control  the  system's  behavior  by  using  the 
meta-knowledge  prior  to  the  ordinal  knowledge.  However,  the  mechanism  cannot  be 
adopted  because  all  knowledge  in  a  text  base  is  uniformly  represented.  Even  if  an  item 
of  Imowledge  looks  like  ordinal  knowledge  (e.g.,  domain-dependent  know-how),  the 
knowledge  may  be  used  as  meta-knowledge  by  the  fuzzy  CBR.  Since  the  fuzzy  CBR 
tends  to  interpret  the  meaning  of  the  sentence  widely,  the  applicable  range  of  the 
sentence  is  greatly  extended. 


290 


We  desijjned  a  new  inference  mechanism  which  wonders  "What  should  I  do?"  every 
lime  an  inference  proceeds  one  step.  All  knowledge  is  used  as  meta-knowledge  in  the 
wondering  phase.  For  example,  the  knowledge  "if  a  user  asks  a  question,  the  system 
must  answer  the  question"  is  used  as  mcta-knowledge  during  the  wondering  "what 
should  I  do?"  that  follows  the  user's  inquiry.  The  truly  essential  points  are  to  embed 
the  meaning  of  "should",  "I",  and  "do"  as  semantic  or  procedural  primitives  and  to 
evoke  the  wondering  every  time  an  inference  proceeds  one  step. 

4.2  Fuzzy-centered  Architecture  (Positive  Use  of  Fuzziness) 

One  of  the  inherent  characteristics  of  natural  language  is  fuzziness.  This  fuzziness 
varies  from  the  ambiguity  that  exists  in  multiple  parse  trees  for  one  sentence  and 
multiple  meanings  for  one  word  to  the  fuzziness  of  the  meaning  itself. 
Disambiguation  algorithms  have  been  developed  to  cope  with  the  former  type  of 
fuzziness,  but  the  latter  type  has  not  been  considered  in  studies  of  natural  language 
understanding  systems. 

The  basic  concept  of  fuzzy-centered  architecture  (positive  use  of  fuzziness)  is  not  a 
concrete  architecture  but  a  philosophy.  It  insists  that  fuzziness  is  not  a  bad  feature  of 
natural  language  but  rather  a  very  good  feature.  In  disambiguation,  fuzziness  is 
handled  as  something  bad,  but  disambiguation  is  not  the  only  way  to  deal  with  the 
fuzziness  of  natural  language.  In  natural  language,  fuzziness  often  has  an  important 
meaning,  so  in  a  truly  human-friendly  system,  fuzziness  cannot  be  ignored. 

There  are  many  area  which  fuzziness  can  play  an  important  role  in  natural  language 
processing.  We  focused  on  problem  solving  and  learning  which  make  best  use  of  the 
fuzziness  in  natural  language,  as  the  first  attempt  toward  implementing  the  fuzzy- 
centered  architecture. 

(1)  Fuzziness  of  Meaning  in  Natural  Language 

What  kind  of  fuzziness  is  there  in  natural  language?  Examples  are  fuzzy  predicates 
(e.g.,  "tall",  "old"),  fuzzy  modifiers  (e.g.,  "very"  ,  "more  or  less"),  fuzzy  modality 
(e.g.,  "most  of,  "usually"),  and  fuzzy  inference  (e.g.,  generalized  modus  ponens[12], 
gradual  rules[2]). 

Fuzziness  can  be  roughly  categorized  as  shown  in  the  table  in  Fig.  3.  Basically, 
fuzziness  can  be  divided  into  ambiguity  and  fuzziness  of  meaning.  For  example,  the 
ambiguous  word  "execute"  can  mean  "start  a  program"  or  "kill  a  person".  A  sentence 
is  ambiguous  when  it  has  multiple  parse  trees.  For  example,  "A  girl  saw  the  boy 
with  a  telescope"  can  be  interpreted  in  two  ways,  depending  on  who  has  the  telescope. 


Word 

Sentence 

Ambiguity 

Multi-meaning 

Multi-structure 

Fuzziness  of 

Simpk 

"tall" 

well-struclurec 

"the  more,  the  more" 
universe  of  discourse 

Complex 

"handsome"  |  proverbs 

unknown  universe  of  discourse 

Fig.  3  Classification  of  fuzziness 


Fuzziness  in  the  meaning  of  a  word  can  be  classified  as  simple  or  complex.  Simple 
fuzziness  can  be  represented  with  a  well-structured  universe  of  discourse.  For  example, 
"tall"  can  be  defined  on  a  height  axis.  The  universe  of  discourse  for  "height"  is  the 
continuous  real  number  system.  On  the  other  hand,  it's  quite  difficult  to  define  such  a 
universe  of  discourse  for  a  word  like  "handsome". 


291 


Similarly,  fuzziness  in  the  rucaning  of  a  sentence  can  also  be  classified  as  simple  or 
complex.  An  example  of  a  simple  fuzzy  sentence  is  "the  more  xxx,  the  more  yyy". 
In  this  sentence,  the  relationship  between  xxx  and  yyy  is  fuzzy.  In  other  words,  "the 
more  ...  the  more  ..."  is  a  fuzzy  sentence  stnicture.  A  more  complex  sentence,  such  as 
a  proverb,  exhibits  complex  fuzziness. 

The  table  in  Fig.  4  shows  the  corresponding  fuzzy  theories  that  can  be  applied  to  the 
types  of  fuzziness  listed  in  Fig.  3.  For  example,  the  simple  fuzziness  of  a  word  can 
be  processed  by  fiizzy  set  or  fuzzy  symbol  theory.  Generalized  modus  ponensf  1 2]  and 
gradual  inference[2]  are  applicable  for  the  simple  fuzziness  of  a  sentence.  We  can  see 
that  fuzzy  theory  provides  us  with  ample  methods  for  coping  with  the  various  types  of 
fuzziness  found  in  natural  language. 


Word 

Sentence 

Ambiguity 

disambiguation  by  world  knowledge 

and  user  interaction 

Fuzziness  of 
Meaning  simple 

Fuzzy  Set 

Fuzzy  Symbol 

Generalized  Modus  Ponens 
Gradual  Inference 

Complex 

Fuzzy  Rule 

Fuzzy  Inference 

(Fuzzy  CBR) 

Fig.  4  Application  of  fuzzy  theory 


(2)  3*Layered  Fuzzy  Inference  Mechanism 

Figure  5  shows  our  3-layered  fiizzy  inference  architecture  for  dealing  with  the  fuzziness 
in  Figs.  3  and  4  as  a  constituent  of  the  fuzzy-centered  architecture.  In  this  3-layered 
hierarchy  of  fuzzy  inference,  the  basic  inference  mechanism  is  ordinary  (non-fuzzy) 
inference,  that  is  regular  modus  ponens.  In  this  base  layer,  text  symbols  are  treated 
simply  as  labels. 

The  second  layer  is  fuzzy  inference  on  a  fuzzy  set  or  fuzzy  symbol.  In  this  layer,  a 
symbol  is  treated  as  either  a  fuzzy  set  (i.e.,  a  sort  of  distribution  over  a  universe  of 
discourse)  or  a  fuzzy  symbol  (i.e.  words  that  can  be  calculated).  A  typical  example 
for  this  level  is  deduction  for  result  "the  apple  is  very  ripe”  from  the  gradual  rule 
"the  more  an  apple  is  red,  the  more  the  apple  is  ripe.”  and  the  information  "the  apple 
is  very  red."  Since  this  can  be  formalizes  as  (A-*B,  A')  =>  B',  where  A'  and  B’ 
are  similar  to  A  and  B,  this  inference  is  a  type  of  case-based  reasoning.  Current  CBR 
methodology  gives  only  a  vague  outline  of  the  inference  method,  but  the  generalized 
modus  ponens  in  fuzzy  theory  provides  a  concrete  algorithm  for  executing  a  type  of 
CBR  having  the  formalism  (A-*B,  A')  =>  B'.  The  top  layer  is  fuzzy  case-based 
reasoning.  It  is  CBR  extended  by  fuzzy  theory,  which  tries  to  match  two  cases 
(which  are  also  represented  as  text)  on  the  basis  of  their  fuzzy  relationship.  This  can 
be  formalized  as  (A-*B,  C)  =>  D,  where  C  and  D  are  different  from  A  and  B.  In 
the  problem  solving  phase,  this 
mechanism  is  used  in  bottom  to  top 
manner,  i.e.  ordinary  inference,  fuzzy 
inference  and  fuzzy  CBR  are  evoked  in 
order.  The  learning  phase,  on  the  contrary, 
is  a  top  to  bottom  process.  This  fuzzy 
CBR  can  be  applied  to  the  control  of 
conversation  by  treating  the  conversation 
history  as  case  data. 


Fig.  5  Hierarchy  of  Fuzzy  Inference 


292 


5.  Two  Experimental  Systems 

Here's  a  very  short  summary  of  the  experimental  systems.  See  [10,1 1]  for  detail. 

5.1  A  Text-based  Natural  Language  System  :  AB-System[10] 

An  experimental  text-based  system,  called  the  AB-System,  was  developed  at  Carnegie 
Mellon  University  to  demonstrate  the  feasibility  of  using  a  text-based  architecture. 
System  Architecture 
Figure  6  shows  the  system  architecture. 

The  Generalized  LR  Parser/Compiler 
Version  8.4  and  Generation  Kit  were 
used  to  implement  the  NL  Parser  and 
Generator.  Common  Lisp  was  used  to 
implement  tiie  pattern  matcher.  The 
AB-System  is  currently  running  on  an 
IBM-RT. 

Knowledge  Representation 

The  AB-System  uses  only  eight  cases: 

Actor,  Purpose,  Object,  Cause, 

Instrument,  Location,  Time,  and 
Unknown,  but,  "Event-Modify"  and 
"Describe"  are  added  to  represent  complex  meanings. 

Pattern  Matcher 

The  AB-System  only  knows  about  semantic  primitives  and  procedural  primitives. 
The  number  of  these  primitives  is  quite  limited,  so  the  AB-System  derives  most  of  its 
power  from  the  text  base  and  the  pattern  matcher.  The  key  part  of  the  pattern  matcher 
is  the  event  matcher,  which  tries  to  unify  two  events. 

In  a  pattern  matcher,  a  new  goal  is  sometimes  generated  for  backward  reasoning 
although  the  basic  inference  mechanism  is  forward  reasoning  (propagation  of  an  input 
text  in  the  text  base  network).  The  generated  goal  is  represented  in  a  case-based 
structure  in  order  to  handle  this  goal  in  the  same  way  as  a  user  inquiry.  Moreover,  all 
knowledge,  including  meta-knowledge,  is  represent^  in  a  case-based  structure,  so  the 
status  of  network  inference  is  represented  as  a  case-based  structure.  This  means  that  the 
system  can  always  explain  its  own  activity  to  provide  the  text  generator  with  the 
inference  status  represented  in  the  case-based  structure. 

Example  of  Communication 

AB-system  works  in  a  mixed  situation,  i.e.,  QA  about  the  French  Revolution, 
knowledge  debugging  and  computer  front-end.  When  a  user  inputs  a  text,  the  AB- 
system  transforms  it  into  a  case-based  structure  and  checks  whether  or  not  the 
knowledge  is  already  known.  If  the  knowledge  is  unknown,  it  stores  the  text  in  a  text 
base,  transforms  it  back  to  natural  language  and  displays  "Okay.  I  understood." 
followed  by  the  generated  text.  Examples  of  the  inputs  are  listed  below. 

"The  French  Revolution  occurred  in  1789  in  France." 

"Robespierre  suspended  the  constitution  and  assumed  the  dictatorial  ptowers." 

"The  article  describes  the  pterson,  if  a  pterson  is  described  in  a  article." 

"AB  system-calls  vi  with  the  file  in  order  to  display  a  file,  if  the  file  is  long." 

"When  was  the  'Queen  Antoinette'  executed,  if 'to  be  beheaded'  means  'to  be  executed'  ?" 
"Show  all  knowledge  about  'Queen  Antoinette'." 

"Display  the  article  that  describes  'the  p>erson  who  was  executed  in  1793'." 


Case-basetj 

structure 

Generalizedt 


LR  Parser  |  ^ 


Pattern  Matcher 


Pattern  Matcher 


TEXT- 


End  User 


Case-based 
Structure 
Generation 
Kit 
TEXT 


Si 

J 


Fig.  6  AB-System  Architecture 


Fig.  7  Examples  of  the  inputs 


293 


5.2  A  Fuzzy  Expert  System:  FOREX  (4,11] 

We  have  built  a  fuzzy  expert  system  that  predicts  the  exchange  rate  trends  of  the  yen 
against  the  dollar  bas^  on  5000  fuzzy  rules  and  300  fuzzy  frames.  It  makes  full  use  of 
fuzzy  theory,  i.e.  fuzzy  set  theory,  fuzzy  logic,  fuzzy  measures  and  fuzzy  integrals[6]. 
Overview  of  FOREX 

FOREX  consists  of  a  state  recognition  part  and  a  scenario  evaluation  part.  First, 
numerical  data  and  news  data  are  entered  into  the  state  recognition  part  and  each  data 
item  is  converted  into  one  or  a  collection  of  fuzzy  qualitative  linguistic  values.  Several 
important  indexes  which  strongly  influence  the  fcxeign  exchange  market  are  deduced  as 
inputs  to  the  scenario  evaluation  part  Second,  the  important  indexes  are  used  to  select 
the  most  suitable  scenario  that  describes  possible  future  changes  in  the  exchange  rate. 
The  special  features  are: 

Separation  of  State  Recognition  Part  and  Scenario  Evaluation  Part 
In  the  state  recognition  part,  numerical  data  and  news  are  analyzed  and  synthesized  into 
fuzzy  qualitative  indicators.  Then,  in  the  scenario  evaluation  part,  the  fuzzy  qualitative 
indicators  are  used  to  choose  the  mo«*  likely  scenario  stored  in  FOREX. 

Four-layered  State  Recognition 

The  state  recognition  part  is  divided  into  four  levels,  0  to  3,  the  highest  of  which 
presents  the  highest  degree  of  abstraction.  Raw  numerical  data  in  level  0  is  evaluated 
and  translated  into  fuzzy  data.  Level  2  looks  like  a  state  transition  network  in  which 
each  state  stands  for  an  aspect  of  economic  fundamentals.  News  is  given  in  level  2 
directly.  The  state  of  the  network  is  summarized  into  a  few  dozen  indicators  in  level  3 
for  use  by  the  scenario  evaluation  part.  The  state  recognition  pan  corresponds  to  the 
middle  part  of  3-layered  fuzzy  inference  mechanism. 

State  Representation  by  3x3  Fuzzy  Linguistic  Qualitative  Values 

All  states  in  levels  1  to  3  are  represented  by  3x3  linguistic  fuzzy  variables,  i.e. 

combinations  of  past/cuirent/future  and  level/differential/quadratic  di^erentials. 

Scenario  Evaluation  by  Fuzzy  Integral 

The  conditions  of  the  scenario  are  given  as  fuzzy  measures  on  the  state  values  in  level 
3.  Each  scenario  is  evaluated  by  fuzzy  integrals  and  sorted  by  fuzzy  ranking. 
Treatment  of  Fuzziness 

Although  FOREX  uses  various  fuzzy  methods,  we  found  that  the  most  effective 
methods  are  fuzzification  of  fuzzy  "words"  and  generalized  modus  ponens,  especially 
gradual  inference. 

Fuzz^ication  of  Words 

Most  of  the  state  values  that  need  to  be  represented  in  FOREX  are  mainly  of  the  type 
summed  up  by  the  statement;  "Roughly  somewhat  about  here,  but  I'm  not  sure 
exactly  where".  Such  data  are  transformed  into  possibility  distributions  (fuzzy  set). 
For  example,  "US-short-term-interest  is  somewhat  high"  can  be  represented  by  a 
triangular  possibility  distribution  with  a  value  of  one  on  the  linguistic  label  "high" 
and  a  value  of  less  than  one  on  other  labels  ("very  high",  ”more-or-less  high",  and  so 
on).  Ail  linguistic  statements  and  numerical  data  are  converted  to  possibility 
disoributions  over  the  linguistic  label.  This  transformation  reflects  the  feeling  that  the 
word  "high"  does  not  mean  "strict  high"  but  "vague  high". 

Generalized  Modus  Ponens 

One  of  the  most  impoiant  features  of  fuzzy  logic  is  that  it  can  represent  the  fuzziness 
in  the  relation  between  a  condition  and  a  consequent.  When  an  expert  says  "If  FF-rate 
is  high  then  US-short-term-interest  is  high",  it  is  often  equivalent  to  "The  more  FF- 
rate  is  high  ,  the  more  US-short-term-interest  is  high".  That  is  the  gradual  nature  of 
the  rule  structure. 


294 


6.  Summary 

We  proposed  two  new  design  concepts,  text-based  architecture  and  fuzzy-centered 
architecture  (positive  use  of  fuzziness),  to  overcome  problems  and  limitations  of 
conventional  natural  language  understanding  systems,  including  (1)  the  closed  nature 
of  systems,  (2)  the  fact  that  implicit  and  inherent  knowledge  is  represented  in  a  special 
form,  (3)  deep  structure  representation  of  meaning,  and  (4)  neglect  of  the  fuzziness  of 
meaning. 

The  text-based  architecture  was  directly  derived  from  problems  (1),  (2)  and  (3).  The 
four  subconcepts,  (1)  combination  of  case-based  structure  and  text  base,  (2)  semantic 
and  procedural  primitives,  (3)  network  inference,  and  (4)  non-hierarchical  wondering 
inference,  are  proposed  to  make  a  text-based  architecture  feasible. 

The  fuzzy-center^  architecture  is  derived  from  the  fact  that  most  natural  language 
understanding  systems  can  only  cope  with  ambiguity,  even  though  the  fuzziness  of  the 
meaning  itself  is  quite  important.  The  3-layered  hierarchy  of  fuzzy  inference  was 
presented  as  a  possible  inference  mechanism.  The  hierarchy  consists  of  (1)  ordinary 
(non-fuzzy)  inference,  (2)  fuzzy  inference  on  fuzzy  sets  ot  fuzzy  symbols,  and  (3)  fuzzy 
case-based  reasoning. 

Two  experimental  systems,  the  AB-System  and  FOREX,  demonstrate  the  feasibility 
of  new  design  concepts,  although  they  do  not  covct  the  full  range  of  the  new  concepts. 
We  are  now  doing  fundamental  research  on  the  problems  discussed  above  to  fully 
implement  FLINS  based  on  the  two  new  design  concepts. 

References 

1 .  C.  Dimacro  and  G.  Hirt:  Current  Approaches  To  Natural  Language  Semantics, 
Tutorial  Proceedings,  AAAl-90  (1990). 

2.  D.  Dubois  and  H.  Prade:  Gradual  Inference  Rules  in  Approximate  Reasoning, 
Information  Science,  Vol.61,  pp.  103-122  (1992). 

3.  CJ  Fillmore:  The  Case  for  Case,  in  Bach,  E.  and  Harms,  R.T.  (eds.):  Universals 
in  Linguistics,  Holt  Rinehart  and  Winston  New  York,  pp.  1-90  (1968). 

4.  Y.  Katoh,  H.  Yuize,  M.  Yoneda,  K.  Takahashi,  S.  Tano,  T.  Yagyu,  M.  Grabish 
and  S.  Fukami:  Gradual  Rules  in  a  Decision  Support  System  for  Foreign 
Exchange  Trading,  Int'l  Conf.  on  Fuzzy  Logic  &  Neural  Net.,  pp.625-628  (1992). 

5.  J.M.  L  ine  :  PRAGMA  -  A  Flexible  Bidirectional  Dialogue  System,  AAAI-90, 
pp.964-969  (1990). 

6.  T.  Murofushi  and  M.  Sugeno:  An  Interpretation  of  Fuzzy  Measures  and  Choquet 
Integral  with  resprect  to  a  Fuzzy  Measure,  Fuzzy  Sets  and  Systems,  Vol.29, 
pp.201-227  (1989). 

7.  S.  Nirenburg  Machine  Translation,  Studies  in  Natural  Language  Processing, 
Cambridge  University  Press  (1987). 

8.  R.C.  Shank:  Conceptual  Dependency:  A  Theory  of  Natural  Language  Analysis, 
Cognitive  Psychology,  Vol.  3  (1972). 

9.  O.  Stock:  Natural  Language  and  Exploration  of  an  Information  Space:  the 
ALFresco  Interactive  System,  UCAI-91,  pp.972-978  (1991). 

10.  S.  Tano:  Natural  Language  Understanding  System  using  Text-based  Inference, 
CMU-CMT-91-124  (1991). 

11.  S.  Tano,  H.  Yuize,  T.  Yagyu,  M.  Yoneda,  Y.  Katoh,  M.  Grabish  and  S.  Fukami: 
FOREX:  Foreign  Exchange  Trade  Support  Expert  System,  International  Fuzzy 
Engineering  Symposium  '91,  pp.lll4-lll5  (1991). 

12.  R.R.  Yager:  FUZZY  SETS  AND  APPLICATIONS:  Selected  Papers  by  L.A. 
Zadeh,  JOHN  WILEY  &  SONS  (1988). 


I 


Boolean  Reasoning  for  Decision  Rules 
Generation 


Andrzej  Skowron 

Institute  of  Mathematics,  University  of  Warsaw 
02-097  Warsaw,  Banacha  2,  Poland 
e-mail;  skowron@mimuw.edu.pl 


Abstract.  In  the  paper  we  investigate  the  generation  problem  of  opti¬ 
mal  decision  rules  with  some  certainty  coefficients  based  on  belief  [7]  and 
rough  membership  functions  [6].  We  show  that  the  problems  of  optimal 
rules  generation  can  be  solved  by  boolean  reasoning  [2]. 


1  Introduction 

In  the  paper  we  discuss  the  problem  of  optimal  decision  rules  generation.  Decision 
rules  have  the  following  form;  r  =»  r'  where  r,  r'  are  boolean  combinations 
of  descriptors  built  from  conditions  and  a  decision  approximating  the  expert 
decision  [9],  respectively.  The  decision  rules  are  generated  with  some  certainty 
coefficients  expressed  by  the  basic  functions  of  evidence  theory  (basic  probability 
assignments,  belief  and  plausibility  functions)  and  rough  membership  functions 
computable  from  a  given  decision  table.  These  coefficients  can  be  used  in  the 
decision  making.  The  method  of  rules  generation  is  based  on  a  construction  of 
appropriate  boolean  functions  from  modified  discernibility  matrices  [10].  The 
optimal  forms  of  the  rules  with  respect  to  the  number  of  attributes  occurring 
in  them  are  obtained  from  prime  impiicants  of  those  functions.  Two  kinds  of 
optimal  decision  rules  are  considered:  locally  optimal  rules  and  globally  optimal 
rules.  We  show  that  our  method  can  be  applied  also  for  construction  of  decision 
rules  based  on  /?-lower  and  /J-upper  ai  p:  oximations  of  sets  [12]. 

2  Information  Systems  and  Rough  Sets  ! 

I 

Information  systems  [4,5]  are  used  for  representing  knowledge  suitable  for  ap-  I 

propriate  decision  making. 

Rough  sets  have  been  introduced  [4,5]  eis  a  tool  to  deal  with  inexact,  un¬ 
certain  or  vague  knowledge  in  artificial  intelligence  applications,  like  for  exam¬ 
ple  knowledge  based  systems  in  medicine,  natural  language  processing,  pattern 
recognition,  decision  systems,  approximate  reasoning. 

In  this  section  we  recall  some  basic  notions  related  to  information  systems 
\  and  rough  sets. 

(  An  tnformatton  system  is  a  pair  A  =  (U,  A),  where  17  is  a  non-empty,  finite  set 

.  called  the  universe  and  A  -  a  non-empty,  finite  set  of  attributes,  i.e.  a  :  U  Va 

I  for  a  £  A,  where  14  is  called  the  value  set  of  a. 


A 


296 


Elements  of  U  are  called  objects  and  interpreted  as,  e.g.  cases,  states,  pro¬ 
cesses,  patients,  observations.  Attributes  are  interpreted  as  features,  variables, 
characteristic  conditions  etc. 

We  consider  a  special  Ccise  of  information  systems  called  decision  tables.  A 
decision  table  is  any  information  system  of  the  form  A  =  {U,A  U  {d}),  where 
d  ^  A  is  a  distinguished  attribute  called  decision  .  The  elements  of  A  are  called 
conditions 

It  is  enough  to  consider  decision  tables  with  one  decision  because  always  by 
simple  coding  one  can  transform  any  decision  table  with  more  than  one  decision 
into  a  decision  table  with  exactly  one  decision.  One  can  interpret  a  decision 
attribute  as  a  kind  of  classification  of  the  universe  of  objects  given  by  an  expert 
decision-maker,  operator,  physician,  etc. 

The  cardinality  of  the  image  0^  =  {k  :  d(s)  —  k  for  some  s  S  U}  is  called 
the  rank  of  d  and  is  denoted  by  r(d).  We  assume,  without  loss  of  the  generality, 
that  the  set  Vj  of  values  of  the  decision  d  is  equal  to  r(d)}. 

Let  us  observe  that  the  decision  d  determines  the  partition  CLASSi,(d)  = 
{ A'l,  ^r(<i)}  of  fhe  universe  U,  where  Xk  =  d~’({t})  for  1  <  i  <  r(d). 

CLASS£{d)  will  be  called  the  classification  of  objects  in  A  determined  by  the 
decision  d. 

Let  A  =  (U,A)  be  an  information  system.  With  every  subset  of  attributes 
B  C  A,  an  equivalence  relation,  denoted  by  INDi{B)  (or  IND(B)  )  called  the 
B-indiscernibiliiy  relation,  is  associated  and  defined  eis  follows: 

IND{B)  =  {(s,s')  6  :  for  every  a  £  B,a(s)  =  a(s')} 

Objects  s,s'  satisfying  the  relation  IND(B)  are  indiscernible  by  attributes 
from  B. 

Let  A  be  an  information  system  with  n  objects.  By  M(A)  [10]  we  denote  an 
n  X  n  matrix  (cjj),  called  the  discernibility  matrix  of  A  such  that 

c.j  =  {a  €  A  :  a(ar,)  ^  for  i,;  =  1, . . . ,  n. 

A  discernibility  function  ft,  for  an  information  system  A  is  a  boolean  function 
of  m  boolean  variables  corresponding  to  the  attributes 

respectively,  and  defined  as  follows: 

ftidi,.  .  .,dm)  =  /\{\/cij  ■■  I  <  j  <  i  <  n,Cij  0} 

where  c,j  =  {a  :  a  €  Cij  } 

It  can  be  shown  [10]  that  the  set  of  all  prime  implicanis  of  ft,  determines  the 
set  of  all  redacts  of  A.  Here  we  apply  an  analogous  method  for  the  decision  rules 
generation  by  a  generalization  of  the  discernibility  matrix  notion.  The  modified 
discernibility  matrix  MG(A)  is  a  subset  of  !P(A)  x  x  {1, . . .,  n}  com¬ 

putable  from  M(A)  (where  by  P(A)  we  denote  the  power  set  of  A  ).  By  fMG(A) 
we  denote  a  boolean  function  constructed  from  MG(A)  in  an  analogous  way 
as  ft  from  M(A).  Different  forms  of  decision  rules  are  obtained  from  different 
constructions  of  sets  MG(A)  (see  Section  4). 


297 


If  A  =  (U.A)  is  an  information  system,  H  C  A  is  a.  set  of  attributes  and 
A'  C  (/  is  a  set  of  objects  then  the  sets 

{s£U  ■.  [s]b  CX)  a-d  {seU  :  [s]b  0X^0} 

are  called  B-lower  and  B-upper  approximation  of  A'  in  A,  and  they  are  de¬ 
noted  by  BA'  and  BA',  respectively. 

The  set  BNb{X)  =  BX  —  BX  will  be  called  the  B-boundary  of  X.  When 
B  =  A  we  write  also  BN^(X)  instead  of  BNa(X). 

Sets  which  are  unions  of  some  classes  of  the  indiscernibility  relation  IN  D(B) 
are  called  definable  by  B.  Some  subsets  (categories)  of  objects  in  an  information 
system  cannot  be  expressed  exactly  by  employing  available  attributes  but  they 
can  be  roughly  defined.  The  set  X  is  B-definable  if  BX  =  BX. 

The  set  ^X  is  the  set  of  all  elements  of  U  which  can  be  with  certainty 
classified  as  elements  of  X,  having  the  knowledge  represented  by  attributes  from 
B;  BX  is  the  set  of  elements  of  U  which  can  be  classified  as  elements  possibly 
belonging  to  A',  employing  the  knowledge  represented  by  attributes  from  B;  set 
BNb{X)  is  the  set  of  elements  which  cannot  be  classified  either  to  X  or  to  -X 
having  knowledge  B. 

Every  information  system  A  =  {U,A)  determines  an  information  function 


Inh  ;  t/  -  P(>1  X  [J  V,) 

a^A 


defined  as;  Inf^(x)  =  {(a,a(x))  :  a  6  A}.  The  set  {Inft(x)  :  x  €  1/}  is 
called  the  A-information  set  and  it  is  denoted  by  IN F{A). 

Let  A  =  {U,A)  be  an  information  system  and  let  0  ^  X  C  U.  The  rough 
A-membership  function  of  the  set  X  (or  rm-function,  for  short),  denoted  by  //*  , 
is  defined  as: 


IM^nxi 

\[^U\ 


for  X  €  (/ 


One  can  observe  that  this  is  a  generalization  of  the  set  characteristic  function. 
Properties  of  rm-functions  are  discussed  in  [6]. 

For  every  X  C  U  the  rough  A-information  function,  /i*  is  defined  by: 


fix(u)  =  -  where  u  £  IN F(A)  and  Infi(x)  =  u 


3  Rough  Set  Interpretation  of  the  Basic  Functions  of  the 
Evidence  Theory 

The  classification  problems  are  central  for  the  rough  set  approach  [5]  as  well  as 
for  the  evidence  theoretic  approach  [7]. 

The  fundamental  assumption  in  the  rough  set  approach  is  the  following  one; 
the  objects  from  the  universe  are  perceived  only  through  the  accessible  informa¬ 
tion  about  them,  i.e.  the  values  of  attributes  which  can  be  evaluated  on  these 
objects.  Objects  with  the  same  information  are  indiscernible.  In  the  consequence 


298 


the  clcisslfication  of  objects  is  based  on  the  accessible  information  about  them, 
not  on  objects  themselves.  Together  with  the  information  about  objects  from  a 
finite  set  of  objects  the  classificatioti  of  them  delivered  by  an  expert  is  given. 
The  cl2issification  problem  in  this  case  is  related  to  the  question  in  what  extent 
it  is  possible  to  reflect  by  accessible  (condition)  attributes  the  classification  done 
by  expert. 

In  the  evidence  theory  [7]  the  information  about  sets  creating  a  partition 
is  embedded  directly  in  some  numerical  functions  whereas  in  the  ceise  of  rough 
set  approach  the  information  about  classified  sets  and  objects  is  included  in  a 
decision  table.  The  evidence  theory  approach  is  based  on  the  idea  of  placing  a 
number  from  the  interval  [0,  1],  given  e.g.  by  an  expert,  to  indicate  a  degree  of 
belief  for  a  given  proposition  on  the  basis  of  a  given  evidence. 

We  show  [9]  how  to  compute  the  basic  functions  of  evidence  theory  from  a 
given  decision  table.  First  we  recall  some  bjtsic  notions  of  evidence  theory  [7]. 

A  frame  of  discernment  0  is  a  finite  non-empty  set. 

The  basic  probability  assignment  (bpa)  on  &  is  any  function  m  ;  P(0)  — ♦  R+, 
where  E+  is  the  set  of  nonnegative  reals,  satisfying  the  following  two  conditions; 

m(0)  =  0  and  ^  =  1 

iiC<9 

For  a  given  bpa  rn  two  functions  are  defined. 

A  function  Bel  :  F{&)  — ►  1R+  is  called  the  belief  function  over  0  (generated 
by  m  )  iff  for  any  0  C  0 

Bel{9)  =  ^  m(/i) 

AC9 

A  function  PI  ;  1P(0)  is  called  the  plausibility  function  over  0  (gener¬ 

ated  by  m)  iff  for  any  6  C  0 

Pl{9)=  ^  m(2i) 

Let  us  observe  that  the  inequality  Bel(9)  -1-  Bel{0  —  9)  <  I  can  not  be  in 
general  reduced  to  the  equality  Bel{0)  + Bel(0 -9)  =  1  {P(0)  +  P(0  -9)  =  1  for 
any  probability  function  on  !P(0)).  This  allows  to  take  into  account  ignorance, 
e.g.  if  we  have  no  evidence  at  all,  for  or  against  9  then  Bel{9)  —  Bel{0  —  9)  =0. 

The  plausibility  function  PI  is  definable  by  the  belief  function,  namely:  Pl{9) 
=  1  -  Beli0  -  9)  for  0C0. 

Now  we  will  describe  how  these  functions  can  be  defined  and  interpreted 
on  a  basis  of  rough  set  approach.  For  a  given  decision  table  one  can  define  a 
new  objects  classification  approximating  the  classification  given  by  the  decision 
attribute.  The  approximation  is  constructed  on  the  basis  of  conditions  in  decision 
table. 

The  set  0^  =  {!,...,  r(d)}  determined  by  the  decision  d  in  A  =  {U,  A  U  {d}) 
is  called  the  frame  of  discernment  defined  by  d  in  A. 

We  say  that  the  frame  of  discernment  0  is  compatible  with  A  if  r(d)  =  \0\. 
If  0  =  {^1, .  .  .,0k}  is  compatible  with  0^  then  \  denotes  the  bijection  between 


299 


0  and  0£  defined  by:  \(0i)  =  i  for  i  =  extended  on  subsets  of  0  by 

Y(^)  =  {i  ;  Oi  6  0],  where  0  C0. 

The  objects  from  the  universe  U  of  A  can  be  classified  employing  the  knowl¬ 
edge  represented  by  conditions  from  A.  This  allows  to  decide  that  either  an 
object  belongs  into  the  lower  approximation  of  a  given  set  A’  C  U  or  it  is  in  the 
complement  of  the  upper  approximation  of  A'  or  belongs  to  the  boundary  region 
corresponding  to  A’.  Moreover,  one  can,  in  some  sense,  better  classify  objects 
from  the  boundary  regions.  This  is  based  on  the  following  observation  [9]: 

Proposition  3.1  Let  A  =  {U,A  U  {rf})  be  a  decision  table.  The  family  of  all 
non-empty  sets  from 

{TA'i _ A.\%(a)}  U  {Bdi^iO)  :OC0t  and  |^»|  >  1} 

(where  Bdt(O)  =  flig#  “  partition  of  the 

universe  U .  Moreover  the  following  equality  holds: 

(jTA'i  U  y  Bdt(A)  =  tIJ  A',  for  9  C  0t.  with  |^|  >  1  . 
ie«  ie« 

□ 


The  classification  (partition)  of  the  universe  U  described  in  Proposition  3.1 
is  called  the  standard  classification  of  V  approximating  in  A  the  classification 
C LASStid)  (given  by  expert). 

By  AP PjC LASSt{d)  we  denote  the  family: 


{AXi,  ■ .  ■ ,  AXr(d)}  ^  {Bdt_{0)  :  0  C  0t,  and  |01  >  1} 


VVe  have  a  clear  interpretation  of  the  new  classification.  An  object  from  the 
universe  U  of  A  is  represented  by  an  information  (from  INF(A))  described  in 
the  rows  of  A.  This  object  can  be  classified  exactly  on  the  basis  of  that  infor¬ 
mation  only  when  the  category  (i.e.  the  equivalence  class  of  the  indiscernibility 
relation  I N Dt(A))  corresponding  to  that  information  is  included  in  A',-  for  some 
i.  Otherwise,  that  category  is  included  in  a  boundary  region  of  the  form  Bd£(0), 
for  some  0.  Then  the  considered  object  from  U  represented  by  a  given  informa¬ 
tion  can  be  classified  to  the  boundary  region  of  all  sets  Xi  ,  where  i  E  0  (i.e.  it 
can  be  classified  into  the  union  IJie#  there  is  no  enough  information  to 

decide  either  in  which  of  the  sets  Xi  it  is  or  to  eliminate  some  hypotheses  from 
{A^  :  i  G  0}). 

There  is  a  natural  correspondence  between  subsets  of  0t,  and  elements  of 
APP-CLASSt,{d),  which  can  be  expressed  by  the  following  function; 


FU0}  =  < 


AXi 

0 

Bdt,(0) 


0  —  {t}  for  some  *  (1  <  i  <  r{d)) 

iL0  =  9 
if  \0\  >  1 


300 


Now  we  can  define  the  injection  dt,  :  I  P(0a)  such  that  dt(s)  is  the  unique 
subset  0  of  6>a  such  that  s  G  f\(0).  The  function  dt  can  be  treated  a-s  a  i»ew 
decision  attribute  (defined  by  conditions  in  A)  approximating  the  decision  d. 

Let  0  =  {(?i, . . .  ,0*}  be  a  frame  of  discernment  compatible  with  A.  and  let 
^  .  0  0^  be  the  standard  bijection  between  0  and  0^^  i.e.  x((9, )  =  i  for 

i  =  1,  .  .  . ,  Jb.  The  function  ;  W{0)  — ’  R+,  called  the  standard  baste  prohabthty 
assignment  (defined  by  0.  A  and  \)  is  defined  as  : 


mi(0)  = 


\F^(xm\ 

KM 


for  any  6  C  0  . 


Proposition  3.2  [9]  The  function  nit  defined  above  is  a  basic  probability  as¬ 
signment  fin  the  sense  of  evidence  theory). 


□ 

The  belief  function  Belt  for  the  frame  of  discernment  0  and  decision  table 
A  compatible  with  0  is  defined  as: 

Belt(O)  =  Yi 


where  9  C0  and  mt  is  the  standard  probability  assignment  for  0  and  A. 


Theorem  3.3  Let  0  be  a  frame  of  discernment  compatible  with  the  decision 
table  A  =  (U,AU  {d})  and  let  x  be  the  standard  bijection  between  0  and  0t 
For  any  9  C  0  the  following  equality  holds: 


BeltiO)  = 


I— U»€x(g) 

|(/| 


□ 

The  belief  function  Belt  Bayesian  iff  all  sets  from  CLASSt{d)  ire  de¬ 
finable  by  the  set  A  of  conditions.  In  particular  the  belief  function  Belt' ,  where 
A'  =  (U,  AU  {dt}),  IS  Bayesian.  □ 

Corollary  3.4  Let  0  be  a  frame  of  discernment  compatible  with  the  decision 
table  A  =  {U,  A  U  {d}).  For  any  0  C  0  the  following  equality  holds: 

PIl{0)  - - i^Tj - 


□ 


301 


4  Decision  Rules  Generation 

In  this  section  we  investigate  different  types  of  decision  rules  (for  a  given  decision 
table)  with  some  certainty  coefficients.  These  coefficients  are  computable  from 
a  given  decision  table  and  can  be  used  in  the  decision  making. 

Let  A  =  (U,  A  U  {d})  be  a  decision  table  and  let  V'  =  (Joc/t 
The  atomic  formulas  over  fl  C  ,4  U  {d}  and  V  are  expressions  of  the  form 
a  =  V,  called  descriptors  over  B,  where  a  £  B  and  v  G  V’a.  The  set  IF(fl,  i  } 
of  formulas  over  B  is  the  least  set  containing  all  atomic  formulas  over  B  and 
closed  with  respect  to  the  classical  propositional  connectives  V  (disjunction),  A 
(conjunction)  and  ->  (negation). 

Let  r  G  IF(fl,  V)  (where  B  C  A\J  {d})  then  by  we  denote  the  meaning  of 
r  in  the  decision  table  A,  i.e.  the  set  of  all  objects  in  U  with  property  r,  defined 
inductively  cis  follows: 

1.  if  r  is  of  the  form  a  —  v  then  =  {s  G  :  a{s)  —  n}: 

2.  (r  A  r')A  =  r*  n  r^;  (r  V  r')*  =  r*  U  (--7)4  =  U  -  Tt- 

The  set  F(/l,  T)  is  called  the  set  of  condition  formulas  in  A  and  is  denoted  by 
Ca.  The  set  IF({<9a}.  V),  where  V  =  (V  —  is  called  the  set  of  decision 

formulas  in  A*  =  (t/,  T  U  {5*})  and  is  denoted  by 

If  u  =  {{a  =  v), . . .  ,(ar  =  t^r)}  then  by  we  denote  the  conjunction  (aj  = 
I'l)  A  . . .  A  (or  =  Vr).  By  A(t)  we  denote  the  set  of  attributes  occurring  in  the 
formula  r  or  corresponding  to  boolean  variables  occurring  in  the  formula  r 
A  decision  rule  for  A  is  any  expression  of  the  form 

T  t'  where  r  £  Ci  and  r'  G  ©a  • 

The  decision  rule  t  =>  r'  for  A  is  true  in  A  iff  C  ;  if  ta  =  ta*  then  we 
say  that  the  rule  is  A-exact. 

We  are  looking  for  two  optimal  forms  of  those  rules  with  respect  to  the 
number  of  attributes  on  the  left  hand  side.  The  A-exact  rule  r  ^  r'  is  A- locally 
optimal  iff  there  is  no  A-exact  rule  r"  =>  r'  such  that  A(t")  C  A(t).  The  A- 
exact  rule  r  =>  r'  is  A  -globally  optimal  iff  |.4(7-)|  <  |A(r")|  for  any  A-exact  rule 
T  =>  r  . 

The  optimal  rules  we  obtain  applying  the  mentioned  in  Section  2  method 
based  on  the  modified  discernibility  matrices. 

Decision  rules  of  the  first  type  have  the  following  form; 

T^d£  =  A,  where  zi  C  {1, . . . ,  r((f)}  and  r  G  Ca 

One  can  prove  the  following  theorem: 

Theorem  4.1  Lei  A  =  (U,  A  U  {d})  be  a  decision  table,  AC  r(d)}  and 

let  the  modified  discernibility  matrix  MG\{A)  (for  a  given  A)  be  equal  to 

{cij  -  {d}  :  Cij  £  M{A)  k  {di,(xi)  =  A  xor  di,(xj)  -  Zl)} 

Then  for  any  prime  implicant  t  of  f,\fc,{£)  we  have: 


302 


/.  =  (V{^u  :  ii£  IXh\(A(t).A,k')})L 

where  I N  Fi(A(t),  A.  A*  )  =  {((a,a(j-))  ;  a  £  A(t)}  ;  =  Akx  €  /'}. 

2.  VI’u  ■  G  I N Fi(A(f  ),  A.  k’ ))  ^  dt,  =  A  IS  the  h-locally 

optimal  decision  rule.  If  the  set  .4(/)  of  attributes  occurring  m  the  prime 
implicant  t  has  the  minimal  number  of  attributes  (among  all  prime  implicants 
of  /a/6i(a)  /  the  rule  is  L-globallg  optimal. 


Condit'  n  1  states  that  the  set  defined  by  A  in  A’  is  definable  by  the  "trace" 
in  A  of  any  prime  implicant  of  /a/o,(£,  in  A.  Hence  the  decision  rule  described 
in  Condition  2  is  true  in  A*.  The  rule  describes  information  on  the  basis  of  which 
it  is  possible  to  classify  olijects  into  the  union  Uiea  •'»  without  a  possibility  to 
eliminate  any  hypothesis  from  {A'.  :  i  €  ^}.  The  value  describes  a 

"ehance”  that  an  object,  chosen  from  A  is  classified  by  dt  into  A  (Condition  3). 
Decision  rules  of  the  second  type  have  the  following  form: 

r  ^  \J {dt  =  0  :  $  C  A}  ,  where  _iC{l . r(d)}  and  r  £  Ci 

One  can  prove  the  following  theorem; 

Theorem  4.2  Let  A  =  ((■'.  .4  U  {d})  be  a  decision  table,  AC  { 1 , . , . ,  r{d))  and 
let  the  modified  discernibility  matrix  A/fr'2(A)  (for  a  given  A  )  be  equal  to 

{cij  -  {d}  :  Cij  6  M{A)  k  (0t,{xi)  C  A  xor  C  Zi)} 

Then  for  any  pnme  implicant  t  of  fufCiik.)  we  have: 

T  {y {di,  =  0 -.  0  c  =(V{’-u  -  uc  T\F2(A(t).A,k-)])t 

where  1 N F-i{A(t),  A,  A‘ )  =  {{(a,  a(x))  :  a  £  .4(0}  :  ^k(x)  C  Akx  £  U); 

2.  u  £  IN F2(A(t),  A.A' )}  =:>  \/{dt  =  0  :  9  C  A] 

IS  the  A-tocalty  optimal  decision  rule.  If  the  set  .4(<)  of  attributes  occurring 
in  the  prime  implicant  t  has  the  minimal  number  of  attributes  (among  all 
prime  implicants  of  f,\fG:i(A))  Ihen  the  rule  is  A-globally  optimal. 

3.  = 


Again  Condition  1  states  that  the  set  defined  by  =  0  :  0  C  A}  in  A* 

is  definable  by  the  "trace"  in  A  of  any  prime  implicant  of  fMa2(£)  in  A.  Hence 
the  decision  rule  described  in  Condition  2  is  true  in  A* .  The  rule  describes 
conditions  on  the  basis  of  which  it  is  possible  to  classify  objects  with  certainty 
into  the  union  Uig^i  i-^-  A  Ue^A-  j.  The  value  Belt,{\~^  (A))  describes  a 
"chance”  that  an  object  chosen  from  A  is  claissified  with  certainty  into  the  union 
A'i  (Condition  3  and  Theorem  3.3). 

Decision  rules  of  the  third  type  have  the  following  form: 

r  =>  0  0  n  zi  ^  0},  where  Zl  C  {1, . . . ,  r(d)}  and  r  £  Ca  . 

One  can  prove  the  following  theorem: 


303 


Theorem  4.3  Let  =  {U .  A  U  {d})  bt  a  decision  table .  C  [  1 ,  .  .  . ,  and 

let  the  modified  disrei  nibility  matiir  (f''r  a  given  A  )  be  equal 

{fij  -  {d}  :  c,j  e  A/(A)  A  (f4(x,)  n  A  =  il)  xor  )  n  =  0)} 


Then  for  any  prime  implicani  t  of  f\fa^yk)  we  hat 


I-  (\/{dt  =  9  .onAf:ilt})t.-  =(V{n.  :ue  iNr3{A{t),A,A’)})t 

where  I h' F3(A{t),  A,  A’ )  =  {{(a,  <*1-^))  :  «  6  >4(<)}  :  H  /i  f-  ihkx  G  f'  },' 

:  «€  INF3(A(t),A,A')}  ^\/{dt  =  9  :  d  A  yt  H)} 

IS  the  A^-locally  optimal  decision  rule.  If  the  set  A(t)  of  attributes  occurring 
in  the  prime  implicani  t  has  the  minimal  number  of  attributes  (among  all 
prime  implicants  of  f^G^ik))  then  the  rule  is  A-globalty  optimal. 

\{x  e  U  :  di.{x)  n  A  ID]\ 


3.  PUix-^(A))  = 


m 


a 

Again  Condition  1  states  that  the  set  defined  by  =  ^  .  ^Pizi  9^  0)  in  A' 
is  definable  by  the  ’’trace”  in  A  of  any  prime  implicant  of  /a/Gj(A)  'o  A.  Hence 
the  decision  rule  described  in  Condition  2  is  true  in  A*.  The  rule  describes 
information  about  conditions  from  A  on  the  basis  of  which  one  can  classify 
objects  as  possibly  belonging  to  A'j.  The  value  Plt(x~^(A))  describes  a 
’’chance”  that  an  object  chosen  from  A  is  classified  as  possibly  belonging  to 
Ui€4  A'i  (Condition  3  and  Corollary  3.4). 

From  above  facts  follows  that  th'*  certainty  coefficients  computed  as  the  val¬ 
ues  of  the  basic  probability  assignments,  belief  or  plausibility  functions  can  be 
important  in  the  decision  making.  The  rough  membership  functions  can  be  also 
used  with  the  same  purpose.  For  example,  in  the  case  of  decision  rules  of  first 
type  a  distribution  of  objects  satisfying  disjuncts  t„  (occurring  on  the  left  hand 
side  of  rules)  on  the  sets  from  {A',  :  i  £  A)  can  be  characterized  by 
where  u  G  I N Fi{A{t).  A,  A' ),  A'  G  C LASSi,{d),  ®  =  ({/,  A(<)),  and  <  is  a  prime 
implicant  /mg.(A)- 

Let  us  observe  tliat  the  introduced  certainty  coefficients  are  computable  from 
a  given  decision  table.  The  problem  of  globally  optimal  decision  rule  generation 
is  TVP-hard  (the  proof  is  analogous  eis  for  the  minimal  redu.  t  problem  [10])  but 
it  is  possible  to  construct  efficient  heuristics  generating  decision  rules  in  a  form 
’’near”  to  globally  optimal. 

In  [12]  /?-lower  and  /?-upper  approximations  of  sets  are  defined  for  0  <  /?  <  0.5 
(the  classical  case  [5]  is  obtained  when  /?  =  0).  One  can  extend  our  method  of 
optimal  decision  rules  generation  to  this  case.  It  is  enough  to  consider  instead  of 
dt  a  new  decision  attribute  and  next  to  follow  presented  above  procedures. 
The  decision  is  defined  as  follows: 

if  tix,{s)  >  1  —  /?  for  some  i 
then  3f(s)  =  {1} 

else  if  {i  :  0  <  Hx0s)  <  1  —  =  0 


304 


then  O'lis)  =  {0} 

else  (s)  =  {/  :  ^  -  0}  ■ 

The  discussed  uietliods  of  decision  rules  generation  are  implemented  in  a 
system  for  classifying  objects  . 


Conclusions 

In  the  paper  we  have  proposed  a  method  for  optimal  decision  rules  generation 
with  certainty  coefficients.  Our  method  can  be  applied  also  for  generation  of 
decision  rules  with  minimal  number  of  descriptors  in  each  disjunct  on  the  left 
hand  side  of  rules. 

At  the  end  we  would  like  to  suggest  some  topics  for  further  investigations. 

Our  procedures  can  in  some  cases  generate  a  rule  with  many  disjuncts  on 
the  left  hand  side  of  the  decision  rule.  Moreover,  each  disjunct  can  be  supported 
only  by  a  few  e.vamples.  In  this  case  the  attributes  chosen  for  decision  taking 
are  inappropriate,  namely  they  are  not  suitable  for  expressing  the  characteristic 
properties  of  the  decision  classes  and  a  searching  process  for  some  new,  more 
appropriate,  attributes  (classifiers)  is  necessary.  In  [11]  is  suggested  a  method 
searching  for  classifiers  based  on  some  multi-modal  logics.  The  logic  application 
for  classifiers  searching  is  an  exciting  research  area. 

We  would  like  also  to  investigate  logics  with  belief  functions.  The  semantics  of 
these  logics  is  bcised  on  the,  so  called,  decision  table  maps.  They  are  Kripke  mod¬ 
els  with  worlds  indexed  by  information  vectors  defined  by  a  given  decision  table 
A,  the  accessibility  relation  between  worlds  defined  by  the  information  inclusion 
relation  and  with  special  structures  attached  to  any  information  vector.  Any 
such  structure  is  defined  by  a  restriction  of  A  to  the  information  labelling  that 
structure  and  contains  restricted  to  that  table  belief  functions.  We  investigate 
this  kind  of  logics  as  candidates  for  expressing  new  classifiers,  in  particular  we 
would  like  to  verify  a  hypothesis  that  formulas  of  those  logics  are  often  suitable 
for  expressing  characteristic  properties  of  object  classes. 

The  well  known  rule  of  evidence  combination  from  independent  sources  of 
information  is  the  Dempster-Shafer  rule  [7].  In  general,  the  combination  rule 
should  be  ba.sed  not  only  on  the  bpa  functions  but  also  on  properties  of  knowl¬ 
edge  embedded  in  both  sources.  An  interesting  problem  arises  to  investigate  an 
appropriate  logic  for  this  kind  of  reasoning. 


References 

1.  R.K.  Bhatnager,  L.N.  Kanal:  Handling  uncertain  information:  A  review  of 
numeric  and  non-numeric  methods.  In;  L.N.  Kanal,  J.F.  Lemmer  (eds.);  Un¬ 
certainty  in  Artificial  Intelligence.  Amsterdam;  North  -  Holland  1986 

2.  F.M.  Brown:  Boolean  reasoning.  Dordrecht:  Kluwer  1990 

3.  Y.  Kodratoff,  R.  Michalski;  Machine  Learning.  An  Artificial  Intelligence  Ap¬ 
proach,  vol.  3.  .San  Mateo:  Morgan  Kaufmann  1990 


305 

4.  Z.  Pawlak;  Rough  sets.  International  Journal  of  Information  and  Computer 
Science  11,  344-356  (1982) 

5.  Z.  Pawlak:  Rough  sets:  Theoretical  Aspects  of  Reasoning  about  Data.  Dor¬ 
drecht:  Kluwer  1991 

6.  Z.  Pawlak,  A.  Skowron:  Rough  membership  functions,  to  appear  In:  M.  Fed- 
erizzi,  J.  Kacprzyk  and  R.R.  Yager  (eds  ):  Advances  in  the  Dempster-Shafer 
Theory  of  Evidence.  New  York:  John  Wiley  and  Sons 

7.  G.  Shafer:  Mathematical  Theory  of  Evidence.  Princeton:  University  Press 
1976 

8.  G.  Shafer,  J.  Pearl:  Readings  in  Uncertain  Reasoning.  San  Mateo:  Morgan 
Kaufmann  1990 

9.  A.  Skowron,  J.W.  Grzymala-Busse:  From  rough  set  theory  to  evidence  the¬ 
ory.  to  appear  In:  M.  Federizzi,  J.  Kacprzyk  and  R.R.  Yager  (eds.):  Advances 
in  the  Dempster-Shafer  Theory  of  Evidence.  New  York:  John  Wiley  and  Sons 

10.  A.  Skowron,  C.  Rauszer:  The  discernibility  matrices  and  functions  in  infor¬ 
mation  systems.  In:  R.  Slowinski  (ed.):  Decision  Support  by  Experience  - 
Applications  of  the  Rough  Sets  Theory.  Dordrecht:  Kluwer  1992,  pp. 331-362 

11.  A.  Skowron,  J.  Stepaniuk:  Searching  for  classifiers.  In:  M.  De  Glas,  D.  Gab- 
bay:  Proc.  of  First  World  Conference  on  Foundations  of  Artificial  Intelli¬ 
gence,  Paris,  July  1-5,  1991.  Paris:  Angkor  1991,  pp.  447-460 

12.  W.  Ziarko:  Analysis  of  uncertain  information  in  the  framework  of  variable 
precision  rough  set  model.  In:  International  Workshop  Rough  Sets:  State  of 
the  Art  and  Perspectives,  Poznan  -  Kiekrz  (Poland),  September  2-4,  1992. 
Extended  Abstracts.  Poznan  1992,  pp. 74-77 


Upper  and  Lower  Entropies  of  Belief  Functions 
Using  Compatible  Probability  Functions 


C.W.R.  Chau,  P.  Lingras  and  S.K.M.  Wong 

Department  of  Computer  Science,  University  of  Regina 
Regina,  Saskatchewan,  Canada  S4S  0A2 


Abstract.  This  paper  uses  the  compatible  probability  functions  to  de¬ 
fine  the  notion  of  upper  entropy  and  lower  entropy  of  a  belief  function 
as  a  generalization  of  the  Shannon  entropy.  The  upper  entropy  measures 
the  amount  of  information  conveyed  by  the  evidence  currently  available. 
The  lower  entropy  measures  the  maximum  possible  amount  of  informa¬ 
tion  that  can  be  obtained  if  further  evidence  becomes  available.  This 
paper  also  analyzes  the  different  characteristics  of  these  entropies  and 
the  computational  aspect.  The  study  demonstrates  usefulness  of  compat¬ 
ible  probability  functions  to  apply  various  notions  from  the  probability 
theory  to  the  theory  of  belief  functions. 


1  Introduction 

The  concepts  of  belief  functions  were  originally  derived  from  the  upper  and  lower 
probabilities  using  a  multivalued  mapping  [3]  or  a  compatibility  relation  [14]. 
Compatibility  relations  can  be  used  to  develope  a  class  of  compatible  probability 
functions  for  a  given  belief  function  [2,  5,  10].  The  Bayesian  theory  of  probability 
contains  important  concepts  such  as  the  Bayes  rule  of  conditionalization  and 
various  information  measures  that  are  useful  for  making  numeric  judgments. 
This  study  illustrates  how  compatible  probability  functions  can  be  used  to  apply 
similar  concepts  in  the  theory  of  belief  functions. 

The  measurement  of  information  contained  in  a  belief  structure  is  an  impor¬ 
tant  issue  in  any  theory  of  partial  belief  [13].  Shannon  [15]  introduced  the  entropy 
function  to  measure  the  information  contained  in  a  Bayesian  probability  func¬ 
tion.  The  Shannon  entropy  has  been  found  useful  in  a  variety  of  applications 
[6,  12].  In  order  to  adopt  Shafer’s  theory  for  similar  applications,  it  is  neces¬ 
sary  to  introduce  an  appropriate  information  meeisure  for  the  belief  functions. 
Shafer  used  the  weight  of  evidence  to  estimate  the  information  contained  in  a 
class  of  belief  functions  called  the  separable  support  functions.  Unfortunately, 
Shafer’s  measure  cannot  be  used  for  other  categories  of  belief  functions.  Yager 
[19]  proposed  the  entropy  and  specificity  like  measures  which  can  be  applied  to 
all  categories  of  belief  functions.  The  entropy  like  meaisure  provides  an  indica¬ 
tion  of  the  internal  conflict  in  the  given  evidence,  while  specificity  measures  the 
non-specific  uncertainty.  These  two  measures  are  said  to  complement  each  other, 
but  the  relationship  between  them  is  not  clearly  described.  It  is  also  not  very 
clear  how  one  can  use  the  measures  suggested  by  Yager  or  its  variations  [4,  9,  16] 
to  represent  the  amount  of  information  contained  in  a  belief  function. 


307 


This  study  introduces  the  notion  of  upper  and  tower  bounds  of  entropy  of  a 
belief  function  using  compatible  probability  functions.  The  upper  and  lower  en¬ 
tropies  provide  an  objective  criterion  for  measuring  the  information  contained  in 
a  belief  function.  The  upper  entropy  measures  the  information  conveyed  by  the 
belief  function  based  on  existing  evidence.  The  lower  entropy,  on  the  other  hand, 
represents  the  maximum  possible  information  that  can  be  obtained  by  accumu¬ 
lation  of  evidence.  Both  the  upper  and  lower  entropies  converge  to  the  Shannon 
entropy  for  a  Bayesian  belief  function.  Thus,  they  provide  a  generalization  of  the 
concept  of  entropy  for  probability  functions. 

2  Brief  Review 

For  completeness,  we  summarize  here  some  of  the  basic  notions  in  the  theory  of 
belief  fu;i<  *.ions  [13].  Some  earlier  work  on  information  measures  is  also  briefly 
reviewed. 


2.1  Belief  Functions 

Based  on  an  evidence,  let  0  =  {^i, . . .  ,0„}  denote  the  finite  set  of  all  possible 
answers  to  a  question.  We  refer  to  0  as  the  frame  of  discernment  or  simply  the 
frame  defined  by  this  question.  The  power  set  of  0,  written  2®,  represents  the 
set  of  all  propositions  discerned  by  0. 

The  belief  functions  introduced  by  Shafer  [13]  were  originally  derived  from 
the  concepts  of  upper  and  lower  probabilities  [3]  which  are  useful  for  transferring 
the  probability  from  one  frame  E  to  another  frame  0  by  using  a  multivalued 
mapping  or  a  compatibility  relation. 

Definition  2.1: 

Consider  two  frames  of  discernment  E  and  0.  An  element  ej  E  E  is  compatible 
with  an  element  6i  E  0,  written  ejCOi,  if  the  answer  ej  to  the  question  which 
defines  E  does  not  exclude  the  possibility  that  Oi  is  an  answer  to  the  question 
which  defines  0.  If  an  element  ej  E  E  is  not  compatible  with  an  element  Oi  E  0, 
it  is  written  cis  CjtpOi. 

Compatibility  is  symmetric;  ejCOi  if  and  only  if  OiCej.  The  compatibility 
relationship  can  be  used  to  define  the  notion  of  implication  between  propositions 
from  two  different  frames  of  discernment  [11]. 

Definition  2.2: 

A  proposition  A  €  2^  is  said  to  imply  another  proposition  B  G  2®,  written 
A  =>  B,  if  any  element  Oi  of  0  compatible  with  some  ej  E  A  exists  in  B. 

Definition  2.3: 

A  proposition  A  G  2^  is  said  to  exactly  imply  another  proposition  B  G  2®, 
written  A*^  B,  if  A  implies  B  but  it  does  not  imply  any  proper  subset  of  B. 
Every  A  G  2^  exactly  implies  one  and  only  one  B  G  2®. 


Consider  a  frame  O  which  denotes  the  set  of  possible  answers  to  a  question. 
We  are  interested  in  obtaining  a  probability  function  P  ;  2®  — »  [0, 1]  based  on  the 
given  evidence.  Suppose  it  is  not  possible  to  construct  such  a  probability  function 
directly,  but  from  the  given  evidence  we  can  define  a  frame  -  the  evidence  frame 
E.  Moreover,  let  us  assume  that  based  on  the  evidence  a  probability  function 
P  :  2®  — *  [0, 1]  on  frame  E  is  known,  and  for  simplicity  let  every  ej  E  E  be 
compatible  with  at  least  one  0,  €  O-  The  issue  here  is  how  to  use  this  knowledge 
about  the  evidence  frame  E  to  determine  the  degrees  of  belief  in  the  propositions 
discerned  by  O. 

In  principle,  the  probability  function  P  on  frame  O  can  be  constructed  from 
the  probability  function  on  frame  E  using  the  Bayes  rule  of  conditionalization: 

pm))  =  E  ^  (1) 

e,€E 

where  P({^i}l{ej})  are  the  conditional  probabilities.  Note  that  the  expression 
})  ^  P({  }l{  })  fact  equivalent  to  the  joint  probability 

P({(^i,ej)})  defined  on  the  frame  0  x  E,  i.e., 

Pm<ei)})  =  Pi{ej})x  P{{eMej]).  (2) 

In  practice  it  is  not  always  possible  to  provide  an  accurate  estimation  of  the 
required  conditional  probabilities  in  the  Bayes  rule.  In  such  situations,  one  may 
not  be  able  to  compute  the  probability  P({0i})  from  the  Bayes  rule  as  defined 
by  equation  1.  However,  one  may  construct  belief  functions  instead  to  estimate 
the  degrees  of  belief  in  the  propositions  of  2®  as  follows. 

Given  the  probability  function  P  :  2^  — ►  [0, 1]  on  the  evidence  frame  E,  we 
can  define  a  function  rriE  :  2®  — » [0, 1]  for  any  proposition  P  G  2®  as; 

The  value  mE{F)  is  the  portion  of  the  probability  mass  that  is  attributed  to  the 
union  of  those  propositions  in  2^  which  exactly  imply  the  proposition  P  G  2®. 
A  proposition  P  is  called  a  focal  element  of  the  bpa  if  msiF)  >  0.  The  value 
m£(P)  metisures  the  belief  that  ove  ;  ''mmits  exactly  to  the  proposition  P.  The 
total  belief  committed  to  a  proposition  A  is  given  by; 

BcIeIA)  =  ^e{F).  (4) 

FCA 

BcIe  :  2®  — ►  [0, 1]  is  called  a  belief  function.  Another  quantity  referred  to  as  the 
plausibility,  written  PI,  is  defined  by; 

P/(A)  =  1  -  Bel{^A),  (5) 

where  ->/l  denotes  the  negation  of  A. 

In  many  instances  the  information  about  the  evidence  may  be  so  vague  that 
it  is  not  even  possible  to  explicitly  construct  the  evidence  frame.  However,  when 


309 


a  basic  probability  ^lssignment  the  defined  on  the  propositions  in  2®  is  known, 
it  is  always  possible  to  construct  an  abstract  evidence  frame  E  £is  follows.  Let 
be  the  focal  elements  of  mg.  For  every  focal  element  Fj  of  ttie  we 
assume  that  there  is  a  unique  Cj  £  E  such  that  {ej}*>  Fj,  that  is,  CjCOi  for  all 
6i  G  Fj  and  CjtpOi  for  all  Bi  ^  Fj.  This  means  that  the  number  of  elements  in 
the  abstract  frame  E  will  be  the  same  as  the  number  of  focal  elements  of  mg. 
Thus  the  known  bpa  rriE  can  be  viewed  as  a  probability  function  defined  on  the 
abstract  evidence  frame  E  as; 


P{{ej})  =  mE(Fj).  (6) 

Hereafter  we  will  represent  a  belief  function  on  a  frame  0  in  terms  of  an 
underlying  probability  function  P  defined  on  a  distinct  frame  E  and  a  compati¬ 
bility  relation  C  between  frames  E  and  0.  It  is  understood  that  both  the  frame 
E  and  the  compatibility  relation  C  could  be  abstract  constructs  defined  above, 
which  may  lack  any  semantics  due  to  insufficient  information. 

One  important  feature  in  any  theory  of  partial  belief  is  the  rule  for  combining 
evidence  from  different  knowledge  sources.  The  theory  of  belief  functions  uses 
the  Dempster  rule  for  combining  two  independent  pieces  of  evidence  [13,  11]. 

2.2  Information  measures 

Every  probability  or  belief  function  conveys  certain  information  about  the  frame 
of  discernment  0.  A  quantitative  measure  of  this  information  can  be  very  useful 
for  comparing  various  belief  or  probability  functions. 

Shannon  [15]  proposed  the  entropy  function  H  :  F  — ‘  [0, 1]  ; 

H{P)  ^  ^off„P({B,})  (7) 

«.€© 

to  me^lsure  the  amount  of  information  conveyed  by  a  given  probability  function 
F.  The  probability  function  F  :  2®  — ►  [0, 1]  conveys  more  information  about  the 
frame  0  if  its  entropy  //(F)  is  relatively  small  [15].  The  entropy  function  ff  has 
been  found  very  useful  in  various  applications  of  the  Bayesian  probability  theory 
[6,  12]. 

Shafer  [13]  introduced  the  weight  of  evidence  to  measure  the  information  or 
evidence  contained  in  a  class  of  belief  functions  called  separable  support  functions. 
The  weight  of  evidence  represents  a  gain  in  information  with  the  accumulation  of 
evidence.  This  is  a  desirable  property  for  any  reasonable  measure  of  information. 
Unfortunately,  the  weight  of  evidence  cannot  be  applied  to  a  more  general  class 
of  belief  functions. 

Yager  [19]  proposed  two  measures  that  can  be  applied  to  any  category  of 
belief  functions.  The  entropy  like  measure  £„  measures  the  internal  conflict  in  an 
evidence.  The  specificity  like  measure  Sm  measures  the  non-specific  uncertainty 
of  a  belief  function.  Several  researchers  [4,  9,  16]  proposed  variations  of  Yager’s 
measures.  However,  it  is  not  clear  how  one  can  use  the  pair  (5m, Fm)  or  its 
variations  to  measure  the  amount  of  information  contained  in  a  belief  function. 


310 


3  Class  of  Compatible  Probability  Functions 

The  bcisic  probability  assignment  ni{Fj)  or  the  underlying  probability  function 
P({ej})  defined  on  the  evidence  frame  E  can  be  interpreted  as  the  probability 
mass  that  can  move  freely  within  the  subset  Fj  of  O  [13].  However,  there  is  no 
knowledge  about  the  exact  allocation  of  the  portions  of  P({ej})  to  the  individual 
elements  €  Fj.  This  type  of  non-specific  uncertainty  called  ignorance  is  as 
inherent  in  the  theory  of  belief  functions  as  probabilistic  uncertainty  is  in  the 
Bayesian  theory.  Any  additional  knowledge  that  reflects  the  exact  allocation  of 
p({ej})  within  Fj  such  as  the  conditional  probability  values  P({^i)|{ej})  will  be 
helpful  in  reducing  the  ignorance.  In  fact,  if  such  knowledge  is  known  for  every 
focal  element  of  a  belief  function,  then  the  degree  of  ignorance  will  decreaise  to 
zero  and  consequently  the  belief  function  is  reduced  to  a  Bayesian  probability 
function  as  given  by  equation  1.  In  such  a  case,  one  can  simply  use  the  Shannon 
entropy  function  H  defined  by  equation  7  to  mecisure  the  degree  of  probabilistic 
uncertainty  induced  by  the  probability  function  P. 

If  some  of  the  actual  conditional  probabilities  P({^i}|{e^})  are  not  available, 
the  Shannon  entropy  function  H  cannot  be  applied.  However,  it  may  be  possible 
to  construct  a  class  of  probability  functions  P({^i})  on  0  using  the  probability 
function  P({ej})  defined  on  the  evidence  frame  E  and  the  compatibility  relation 
between  E  and  0.  The  entropy  function  H  can  then  be  applied  to  this  class  of 
probability  functions  P({0»}). 

Let  P({(^i,ej)})  given  by  equation  2  be  the  portion  of  the  known  proba¬ 
bility  meiss  P({ej})  allocated  to  0,-  in  the  focal  element  Fj.  The  constraints  on  : 

P({(^..e>)})  are: 

X]  for  every  e,  e  E,  (8) 

a,€0 

P({{0i,ej)])  =  0  if  ei(pej,  (9) 

P{{{0i,ej)})  >0  if  0iCej.  (10) 

Any  probability  function  P  :  2®^®  — ►  [0, 1]  obeying  these  constraints  is 
called  an  extension  [7]  of  P  :  2®  — »  [0, 1]  or  the  corresponding  belief  function  ^ 

BcIe  In  general,  there  will  be  an  infinite  number  of  such  extensions  for  a  belief  j 

function.  Each  one  of  these  extensions  uniquely  determines  a  probability  function  ! 

P  :  2®  — ♦  [0, 1],  namely:  ' 

^(R})=  P({(0i,e,)}),forevery  0,  G©.  (11) 

Any  probability  function  P  :  2®  — ♦  [0, 1]  satisfying  equation  11  is  said  to  be 
compatible  with  the  probability  function  P  :  2^  —*  [0, 1]  or  the  corresponding 
belief  function  Bels.  The  compatible  probability  functions  represent  a  class  of 
Bayesian  probability  functions  that  can  be  obtained  if  additional  information 
such  as  the  conditional  probabilities  P({^i}|{ej})  were  available.  In  practice, 
these  conditional  probabilities  are  usually  not  available,  especially  if  frame  E  is 


311 


an  abstract  construct  as  previously  defined.  The  compatible  probability  functions 
were  studied  independently  by  Fagin  and  Halpern  [5]  and  Lingras  [10]. 

Original  definition  of  belief  and  plausibility  as  upper  and  lower  probability 
can  be  easily  illustrated  using  class  of  compatible  probability  functions  as  follows. 

Bel{A)  =  rnin  P(j4),  for  all  P  :  2®  — »  [0,  Ijcompatible  with  Bel.  (12) 

Pl{A)  =  m&x  P(A),  for  all  P  ;  2®  — ►  [0,  Ijcompatible  with  Bel.  (13) 

The  Bayesian  theory  of  probability  contains  important  concepts  such  as  the 
Bayes  rule  of  conditionaUzation  and  various  information  measures  that  are  useful 
for  making  numeric  judgments.  This  study  illustrates  how  compatible  probability 
functions  can  be  used  to  apply  similar  concepts  in  the  theory  of  belief  functions. 


4  Upper  and  lower  entropies  of  belief  function 

Since  a  belief  function  can  have  an  infinite  number  of  compatible  probability 
functions,  it  can  therefore  be  associated  with  the  entropy  of  any  one  of  its  com¬ 
patible  probability  functions.  We  can  characterize  this  range  of  entropy  values 
by  an  upper  and  a  lower  bound,  hereafter  referred  to  as  the  upper  and  lower 
entropies  of  a  belief  function.  Of  course,  we  do  not  know  the  exact  entropy  value 
which  a  belief  function  would  assume  within  this  interval  unless  the  conditional 
probabilities  P({^i}|{cj})  are  known. 

Definition  4  i  _ 

The  upper  entropy  of  a  belief  function  Bel,  written  H{Bel),  is  given  by: 

H(Bel)  =  mAxH{P),{oT  all  F  :  2®  — ♦  [0,  Ijcompatible  with  Bel.  (14) 

The  lower  entropy  of  a  belief  function  Bel,  written  H_{Bel),  is  given  by: 

H_{Bel)  =  min  H{P),  hr  all  P  :  2®  -+  [0,  Ijcompatible  with  Bel.  (15) 

As  mentioned  before,  the  amount  of  information  contained  in  a  probability 
function  is  inversely  proportional  to  its  entropy  value.  The  upper  entropy,  which 
corresponds  to  the  most  unbiasedly  distributed  probability  function  compatible 
with  a  belief  function,  provides  a  meaisure  of  the  amount  of  information  contained 
in  the  belief  function.  High  values  of  the  upper  entropy  indicate  the  fact  that  the 
amount  of  information  contained  in  the  belief  function  is  Relatively  low.  Since 
0  <  H{P)  <  1,  the  belief  function  with  an  upper  entropy  H{Bel)  =  1  contains 
the  minimum  amount  of  information.  On  the  other  hand,  the  lower  entropy, 
which  corresponds  to  the  most  biased  probability  function  compatible  with  a 
belief  function,  represents  the  maximum  possible  amount  of  information  that 
can  be  obtained  if  future  evidence  becomes  available.  Low  values  of  the  lower 
entropy  indicate  the  fact  that  the  maximum  possible  amount  of  information 
that  can  be  obtained  from  the  belief  function  by  accumulating  future  evidence 


312 


is  relatively  high.  Theoretically,  a  belief  function  with  a  lower  entropy  of  zero 
can  potentially  lead  to  a  unique  outcome. 

The  upper  and  lower  entropies  of  a  belief  function  show  monotonic  changes 
as  evidence  is  being  accumulated.  In  the  following  subsection  we  will  study  the 
behavior  of  the  upper  and  lower  entropies  with  the  accumulation  of  consistent 
evidence. 

4.1  Accumiilation  of  consistent  evidence 

Consider  belief  functions  fle/g  and  fle/5  defined  by  two  different  pieces  of  ev¬ 
idence.  Let  BcIe  $  fle/5  be  the  result  of  the  combination  of  BcIe  and  fle/5. 
The  accumulation  of  consistent  evidence  should  in  general  lead  to  an  increase  in 
the  amount  of  information  about  the  discerned  frame.  The  weight  of  evidence 
suggested  by  Shafer,  by  construction,  reflects  such  an  increase  in  information. 
The  following  theorem  shows  that  the  upper  and  lower  entropies  indeed  correctly 
reflect  the  accumulation  of  consistent  evidence. 

Theorem  4.1 

Let  fle/g  and  fle/5  be  the  belief  functions  representing  two  pieces  of  evidence 
consistent  [10]  with  each  other.  Let  fle/g  ®  Bets  be  the  belief  function  repre¬ 
senting  the  combined  evidence.  Then, 

fl(Be/g®flc/5)  <  min(H{BelE),H(Bels)),and  (16) 

flCfle/g  ®  fle/5)  >  max(fl(fle/g),  fl(fle/5)).  (17) 

Theorem  4.1  is  proved  in  [2]. 

According  to  Theorem  4.1,  the  upper  entropy  decreases  with  the  combina¬ 
tion  process.  This  behavior  reflects  a  gain  in  information  by  pooling  consistent 
evidence  from  different  sources.  The  lower  entropy,  on  the  other  hand,  provides 
a  mecisure  of  the  maximum  possible  amount  of  information  that  can  be  obtained 
by  the  accumulation  of  consistent  evidence.  The  range  from  H(Bel)  to  H( Bet) 
provides  a  measure  of  the  possible  gain  in  information  that  can  be  achieved  by 
accumulating  evidence.  Thus,  the  difference  AH  =  H{Bel)  —  lL{Bet)  can  be 
considered  as  a  measure  of  ignorance.  As  expected,  ignorance  is  reduced  with 
the  accumulation  of  evidence. 

The  above  discussion  provides  a  plausible  justification  for  the  use  of  upper 
and  lower  entropies  and  the  difference  AH  as  measures  of  information  and  ig¬ 
norance.  In  the  following  section  we  will  illustrate  how  these  measures  can  be 
used  to  characterize  different  categories  of  belief  functions. 


4.2  Different  categories  of  belief  functions 

Information  conveyed  by  different  categories  of  belief  functions  [13]  has  differ¬ 
ent  characteristics.  The  upper  and  lower  entropies  enable  us  to  identify  these 
characteristics  conveniently. 


313 


Theorem  4.2 

Let  Bel  be  a  belief  function  with  H_(Bel)  and  H(Bel)  as  its  lower  and  upper 
entropies.  Then 

1.  If  Bet  is  a  consonant  support  function,  then  H_(Bel)  =  0. 

2.  If  Bel  is  a  vacuous  belief  function,  then  HJ^Bel)  =  0  and  H{Bel)  —  1. 

3.  If  Bel  is  a  Bayesian  belief  function,  then  H_{Bel)  —  H(Bel)  =■  H  {Bel),  whi  r 

H{Bel)  =  -  Be/({^.})log„  Bel{{6.})  =  -  m({0.})log„m({0. 

(18) 


Proof  of  the  theorem  can  be  found  in  [2]. 

Based  on  Theorem  4.2,  one  can  say  that  the  lower  and  upper  entropies  pro¬ 
vide  an  intuitively  appealing  description  of  the  amount  of  information  contained 
in  a  belief  function.  For  example,  a  consonant  support  function  represents  an 
evidence  pointing  in  a  single  direction  [13].  Hence,  the  combination  of  a  conso¬ 
nant  support  function  with  other  belief  functions  which  are  also  pointing  in  the 
same  direction  should  eventually  lead  to  a  unique  outcome,  i.e.  a  belief  function 
with  =  1  for  some  €  O.  The  fact  that  the  lower  entropy  is  equal 

to  0  for  consonant  support  functions  correctly  reflects  this  characteristic.  More 
importantly,  the  upper  entropy  enables  us  to  compare  the  information  contained 
in  different  consonant  support  functions.  The  vacuous  belief  function  is  a  special 
type  of  consonant  support  function  [13].  Hence,  the  above  discussion  is  also  ap¬ 
plicable  to  the  vacuous  belief  function.  The  upper  entropy  of  the  vacuous  belief 
function  is  equal  to  1,  which  means  that  the  vacuous  belief  function  conveys 
no  information.  Also,  the  difference  between  the  upper  entropy  and  the  lower 
entropy  {AH)  for  the  vacuous  belief  function  is  equal  to  1.  This  observation  ap¬ 
propriately  reflects  the  fact  that  the  vacuous  belief  function  represents  maximum 
ignorance.  A  Bayesian  belief  function,  on  the  other  hand,  represents  an  evidence 
with  minimum  ignorance  (which  is  equal  to  0)  because  every  focal  element  is  a 
singleton.  This  is  consistent  with  the  observation  that  the  upper  entropy  of  a 
Bayesian  belief  function  is  equal  to  its  lower  entropy  (i.e.  AH  =  0). 


4.3  Computation  of  the  upper  and  lower  entropies 

There  are  many  numerical  methods  we  can  use  to  compute  the  joint  probabil¬ 
ity  function  by  maximizing  the  entropy  function  defined  by  equation  7  [1,  8]. 
On  the  other  hand,  the  entropy  function  cannot  be  minimized  using  standard 
numerical  methods.  In  order  to  determine  the  lower  entropy,  we  have  to  find 
the  most  biased  compatible  probability  function.  An  algorithm  for  determining 
the  lower  entropy  defined  by  the  most  biased  probability  function  that  is  com¬ 
patible  with  a  belief  function  is  proposed  in  [2].  There  may  be  many  solutions 
for  B({(0i,ej)})  which  maximize/minimize  the  entropy  H{P).  However,  we  are 
only  interested  in  the  maximum/minimum  value  of  H{P),  which  is  unique.  The 


314 


lime  complexity  of  the  above  algorithms  for  the  worst  case  can  be  very  high 
However,  it  may  be  possible  to  increase  the  efficiency  of  these  algorithms  using 
non-linear  programming  techniques. 


5  Conclusion 

Compatible  probability  functions  can  be  used  to  apply  some  of  the  important 
notions  from  the  probability  theory  to  the  theory  of  belief  functions.  This  paper 
introduces  the  notion  of  upper  and  lower  entropies  of  compatible  probability 
functions  to  measure  the  amount  of  information  conveyed  by  a  belief  function. 
The  upper  entropy  is  the  entropy  of  the  most  unbiased  probability  function 
compatible  to  the  belief  function.  It  denotes  the  amount  of  information  conveyed 
by  the  evidence  currently  available.  The  lower  entropy,  on  the  other  hand,  is  the 
entropy  of  the  most  biased  probability  function  compatible  to  the  belief  function. 
It  denotes  the  maximum  possible  amount  of  information  that  can  be  obtained 
if  further  evidence  becomes  available.  The  upper  and  lower  entropies  provide  an 
objective  criterion  for  analyzing  the  information  conveyed  by  a  piece  of  evidence. 
The  difference  between  the  upper  and  the  lower  entropies  indicates  the  ignorance 
induced  by  the  evidence.  The  upper  entropy,  lower  entropy  and  the  difference 
between  them  accurately  reflects  the  information  gain  from  the  accumulation  of 
consistent  evidence. 

References 

1.  D.P.  Bertsekas,  Constrained  Optimization  and  Lagrange  Multiplier  Methods,  Aca¬ 
demic  Press,  Inc.,  New  York,  pp.  71-75,  1982. 

2.  C.W.R.  Chau,  P.J.  Lingras  and  S.K.M  Wong,  Upper  and  Lower  Entroipes  of  Be¬ 
lief  Functions,  Technical  Report,  Department  of  Computer  Science,  University  of 
Regina,  Sask.,  Canada,  1990. 

3.  A.  Dempster,  Upper  and  Lower  Probabilities  Induced  by  a  Multivalued  Mapping, 
Annals  of  Mathematical  Statistics,  38,  pp.  325-339,  1967. 

4.  D.  Dubois  and  H.  Prade,  Properties  of  measures  of  information  in  evidence  and 
possibility  theories.  Fuzzy  Sets  and  Systems,  24,  1987. 

5.  Fagin,  R.  and  Halpern,  J.  (1990).  A  New  Approach  to  Updating  Beliefs,  Proceeding 
of  the  Sixth  Conference  on  Uncertainty  m  Al,  Cambridge,  Mass.,  July  27-29,  pp. 
317-325. 

6.  S.  Guiasu,  Information  Theory  with  Applications,  McGraw-Hill,  London,  1977. 

7.  J.  Hartmanis,  The  Application  of  Some  Basic  Inequalities  for  Entropy,  Information 
and  Control,  Vol.  2,  pp.  199-213,  1959. 

8.  IMSL,  Nonlinearly  Constrained  Minimization  Using  finite-different  gradient,  IMSL 
Math/Library  User’s  Manual,  pp.  895-908,  1987. 

9.  G.  J.  Klir  and  T.A.  Folger,  Fuzzy  Sets,  Uncertainty,  and  Information,  Prentice  Hall, 
Englewood  Cliffs,  NJ,  1988. 

10.  P.J.  Lingras,  Qualitative  and  Quantitative  Reasoning  Under  Uncertainty  in  Intel¬ 
ligent  Information  Systems,  Unpublished  Ph.D.  Dissertation,  Department  of  Com¬ 
puter  Science,  University  of  Regina,  Sask.,  Canada,  1991. 


315 


11.  P.J.  Lingras  and  S.K.M.  Wong,  Two  Perspectives  of  the  Dempster-Shafer  Theory  of 
Belief  Functions,  to  appear  in  The  International  Journal  of  Man- Machine  Studies, 
1990. 

12.  J.R.  Quinlan,  Inductive  inference  as  a  tool  for  the  construction  of  high-performance 
programs.  Machine  Learning,  R.S.  Michalski,  T.M.  Mitchell  and  J.  Carbonell  eds, 
Palo  Alio,  CA:  Tioga,  1983. 

13.  U.  Shafer,  A  Mathematical  Theory  of  Evidence,  Princeton.  N.J.:  Princeton  Univer¬ 
sity  Press,  1976. 

14.  G.  Shafer,  Belief  functions  and  possibilities  measures.  The  Analysis  of  Fuzzy  In¬ 
formation  1,  Bezdek,  J.C.,  CRC  Press,  1986. 

15.  C.E.  Shannon,  A  mathematic  theory  of  communication,  Bell  Technical  Journal, 
Vol.  4,  pp.  379-423,  1948. 

16.  H.E.  Stephanou  and  S.  Lu,  Measuring  Consensus  Effectiveness  by  a  Generalized 
Entropy  Criterion,  IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelli¬ 
gence,  Vol.  10,  No.  4,  pp.  544-554,  July  1988. 

17.  S.K.M.  Wong  and  P.J.  Lingras,  Unification  of  the  Bayes  Conditionalization  and 
the  Dempster  Rule  by  Minimizing  Information  Gain,  to  appear  in  The  proceedings 
of  the  Sixth  Conference  on  Uncertainty  in  AI,  Cambridge,  Mass.,  July  27-29  1990. 

18.  S.K.M.  Wong,  W.  Ziarko  and  R.  Le  Ye,  Comparison  of  rough-set  and  statistical 
method  in  inductive  learning.  International  Journal  of  Man-Machine  Studies,  25, 
pp,  53-72,  1986. 

19.  R.R.  Yager,  Entropy  and  Specificity  in  a  Mathematical  Theory  of  Evidence,  Inter¬ 
national  Journal  of  General  Systems,  Vol.  9,  pp.  249-260,  1983. 


Reasoning  about  Higher  Order  Uncertainty  in 
Possibilistic  Logic 


Churn  Jung  Liau’  an<l  Bertrand  I- Peng  Lin* 


'  Institute  of  Information  Science,  Academia  Sinica, 
Taipei.  Taiwan,  ROC 

^  Department  of  Computer  Science  and  Information  Engineering 
National  Taiwan  University,  Taipei.  Taiwan.  ROC 


Abstract.  Possibilistic  logic  is  an  iinportam  approach  for  reasoning 
about  possibility  and  necessity.  By  formulating  possibilistic  reasoning 
as  a  kind  of  modal  logic,  we  can  represent  and  rea.son  about  higher  order 
uncertainty  to  any  nested  degrees.  In  this  paper,  we  present  a  system 
with  built-in  axioms  for  reasoning  about  higher  order  uncertainty.  Some 
intuition  behind  the  system  is  discussed  and  the  soundness  and  complet- 
ness  of  it  with  respect  to  the  class  of  finite  transitive  and  serial  <in  fuzzy 
sct-theoretic  sense)  possible- world  models  are  proved. 


1  Introduction 

Possibilistic  logic[l]  is  ati  important  approaclt  for  reasoning  about  possibility  and 
necessity.  In  this  logic,  two  certainty  measures,  called  po.s.sibllity  and  nece.ssity 
measures  respectively,  are  assigned  to  the  well-formed  formulas  of  classical  logic 
Thus,  if  /  is  a  well-formed  formula  (wff)  of  propositional  logic,  then  (/(Ac))  and 
(/(/7c))  (c  G  [0,  1])  are  wffs  of  possibilistic  logic.  In  this  way.  the  uncertainty 
of  a  crisp  propo.sition  can  easily  be  represented.  However,  it  is  usually  difficult 
to  estimate  the  possibility  (and  dually,  the  nece.s.sity)  of  a  proposition  preci.sely. 
Thus,  the  possibility  and  necessity  of  a  proposition  are  themselves  subject  to 
judgement  and  evaluation.  For  example,  given  the  information  "John  is  tall",  we 
might  estimate  that  the  necessity  of  John's  height  being  over  six  feet  is  at  least 
0.8.  However,  we  may  be  highly  suspect  the  correctness  of  the  estimation,  .so  we 
will  talk  about  the  uncertainty  of  the  estimation.  This  kind  of  uncertainty  about 
the  estimation  is  called  higher  order  uncertainty.  By  formulating  possibilistic 
logic  as  a  kind  of  multi-modal  language  and  exploiting  the  notions  of  nested 
modal  operators,  we  can  represent  higher  order  uncertainty  naturally.  In  [1],  we 
have  developed  such  a  logic  and  its  axiomatic  system,  called  (piantitativi'  modal 
logic  (QML).  The  remaining  problem  is  what  additional  axioms  are  needed  for 
reasoning  about  higher  order  uncertainty.  The  paper  will  be  devoted  to  the 
solution  of  this  problem. 

In  the  next  section,  we  will  first  review  the  underlying  logic  QML,  Then, 
the  axioms  for  higher  order  uncertainty  rea.soning  are  pre.sented  and  their  impli¬ 
cations  are  also  discus.sed.  Finally,  the  completeness  of  the  extended  axiomatic 
system  is  considered. 


317 


2  Review  of  Quantitative  Modal  Logic 

2.1  Syntax 

QML  is  an  extension  of  propositional  logic  with  four  classes  of  cpiantitafive  iinxla) 
operators:(c),  (c)+,  [c]  and  (c)+  for  all  c  €  (0,  1]. 

The  syntactic  formation  rules  of  QML  consist  of  those  for  propositional  logic 
and  the  following  one, 

'  if  /  is  a  wff,  so  are  (c)/,  (c)"*'/.  (cj/  and  [c]'*'/  for  all  r  ^  [0.  1], 

VVe  usually  use  lower  ca.se  letters  {sometimes  with  indices)  p.q.r  to  denote 
atomic  formulas  and  f,g,h  to  denote  wffs.  We  also  assume  all  classical  logical 
connectives  are  available,  either  as  primitives  or  as  abbreviations. 

2.2  Semantics 

The  semantics  of  QML  is  a  generalization  of  the  standard  Kripke  .semantics  for 
modal  logic[.3].  Define  a  jiosithiltti/  fraiin  F'  =  (IT.  f{}.  where  IT  is  a  set  of  possible 
words  and  R  :  If*  ■—  [0.  1]  is  a  fuzzy  acce.ssibility  relation  on  U'  Let  P\’  and  FA 
denote  the  .set  of  all  propositional  variables  and  the  set  of  all  wffs  respectively. 
Then  a  model  of  QML  is  a  triple  M  =  (U’,  h  when*  (11’.  R)  is  a  possibility 

frame  and  TA  :  IV  x  P\'  —  {0.  1}  is  a  truth  value  assignment  for  all  worlds.  A 
proposition  p  is  said  to  be  true  at  a  world  u  iff  Y’.lfu  .p)  =  1  (Jivei  R.  w<‘  can 
define  a  possibility  distribution  R„.  for  each  ic  €  H  such  that  /f„(.sl  =  /i’(u-,.s) 
for  all  s  in  IV.  Similarly,  we  can  also  define  7’A,,  for  each  u  G  IV  such  that 
TAuip)  —  TA{u  .p)  for  all  p  in  PV.  Thus,  mathematically,  a  model  can  be 
equivalently  written  as  (IV,  (/?„  ,  Y'.-Li  )«  €»')  foven  a  model  M  =  {W.R.l'A). 
w'e  can  define  the  satisfication  relation  IV  x  FA  as  follows  First,  let  us 

defiiie  Nu'[f)  =  inf{l  —  R(w,s)  j  s  t=,\f  "'/.s  G  U  ).  Consequently.  .\„  is  just 
the  necessity  measure  induced  by  the  possibility  distribution  R  'efined  by 
Dubois  and  Prade[l].  Then,  w  t=.if  [c]/  (resp.  ir  [c]'*’/)  iff  .V„  (/)  >  c  (resp. 
>  c).  The  satisfication  relations  for  {c)f  and  {c)^ f  are  defined  analogously  by 
replacing  A'u(f)  with  f7„,(/)  =  1  -  A',,. (-■/).  Here,  for  convenience,  we  define 
sup  0  =  0  and  inf0  =  1.  Furthermore,  the  satisfication  'f  all  other  wffs  are 
defined  a,s  usual  in  cla.ssical  logic. 

A  wfif  /  i.s  .said  ■  oe  valid  in  .\/  —  {\\ .  R.TA).  writen  j=  /  iff  for  all 
u’  G  IV,  11'  /,  It  5  is  a  set  of  wffs,  then  S  means  that  for  all  /  G 

N't  /■  If  C  is  a  class  of  models  attd  .S’  is  a  set  of  of  wffs,  then  we  write  .s'  / 

to  mean  that  for  all  M  £  C,  implie.s  |=.i?  /.  A  model  is  called  si'rial  if 

Vte  G  If’.supjgn^-  R{u’.s)  =  1  and  finite  if  IV  is  finite, 

2.3  Axiomatic  systoiii  for  possibilistic.  reasoning 

An  axiomatic  .system  for  the  cla.ss  of  all  .serial  models  have  been  dt'velojied  in  [1] 
and  we  call  it  D.  The  system  consists  of  the  followitig  axiomatic  sch  .lata  and 
rules  of  inference. 


318 


1.  Inequality  fonstraiiits: 

(a)  AXl  (luonotonicity): 

[e]/  3  [(l]^f.  r  >  (I 

(b)  AX2  (dichotomy): 

WV  3  [c]f 

(c)  AX3  (lower  and  iqiper  hounds): 

[o]/,-[i]V 

2.  Propositional  reasoning: 

AX4:  a  set  of  complete  axioms  for  propositional  logic. 

3.  Possibility  and  necessity  (AXK): 

(a)  [c](/  D  (j)  D  ([c]/  D  [c]g). 

(h)  [c]  +  (/  D  g]  D  ([r]+/  D  [<■]  +  ,/). 

4.  Seriality  (AXD): 

(0]V3{1)/ 

5.  Rules  of  inference: 

(a)  R1  (Modus  Ponens): 

f.fpy 

a 

(b)  R2  (Rule  of  necessitation): 

f 

[1]/ 

Formally,  if  D  is  the  class  of  all  serial  models,  then  for  any  finite  set  fi  of  wffs 
and  wff  /,  we  have  D  |=£)  /  iff  /  is  derivable  from  B  in  the  .system  [). 

3  Axioms  for  Higher  Order  Uncertainty  Reasoning 

In  [4],  it  is  also  shown  that  the  system  D  is  suitable  for  possibilistic  reasoning. 
However,  to  handle  higher  order  uncertainty,  additional  axioms  may  be  needed. 
The  question  is  "  what  are  the  axioms  just  needed  ?” .  There  may  Ix'  many 
answers  to  this  question  depending  on  the  meaning  of  necessity  and  possibility. 
In  traditional  modal  logic,  the  most  basic  axiom  about  nested  modality  is  as 
follows: 

Of  3  oof. 

This  axiom  has  an  intuitive  interpretation  in  epistemic  logic.  It  says  that  if 
some  agent  knows  /.  then  he  knows  that  he  knows  /.  The  axiom  is  valid  in  all 
transitive  possible  world  models  for  ordinary  modal  logic.  Thus,  it  is  natural 
to  investigate  the  corresjronding  model  class  in  the  QML  case,  nefiiu'  a  model 
for  QML  as  (sup-min)  transitive  iff  for  all  ir.ti'  £  If  .  we  ha\e  > 

su().giv  min(  /?( tr.  .s).  R(s.  n  ')), 


319 


Parallel  to  the  above-mentioned  axiom  for  nested  modality,  we  ran  prove 
that  the  following  axioms  are  valid  in  all  transitive  models: 


AXr.\:lc]fD[c][c}f 


and 

.dA'd"  .2  :  [e)+/  D  (rlHV 

However,  the  axiom  .4Ad“''.2  is  somewhat  too  weak  to  obtain  a  complete  axiom 
system  for  transitive  models.  The  main  reason  is  that  we  consider  finite  a.s  well 
as  infinite  models,  while  our  inference  rules  and  axioms  are  essentially  finitary. 
Thus,  if  we  concern  only  the  finite  models,  the  axiom  can  be  strenghtened  as 
follows: 

AA4-V2  MV  D  M^W^/ 

We  define  D4  as  the  union  of  I)  and  the  axioms  .l.Vd"  I  and  .4A4'  .2.  'Fhen 
D4  has  the  following  reasonable  derived  axioms: 

{c){(l)f  D  (min(c,d))/, 

(c)  +  (d)  +  /  D  {min(c,d))  +  /. 

Consequently,  the  axioms  will  allow  us  to  collapse  sequence  of  po,ssibility  oper¬ 
ators  into  a  single  one. 


4  Soundness  and  Completeness 

4.1  Soundness 

We  will  prove  that  the  system  D4  is  sound  and  coinjilete  for  t  he  class  of  all  finite 
serial  transitive  models  in  this  section. 

Let  us  consider  the  ,soundness  theorem  firstly.  It  is  ba,sed  on  the  following 
two  lemmas. 

Lemma  1.  If  C  is  a  class  of  Iransitire  models,  then 

(V  t=c  W/  ^  WW/ 

(2)  Nc  ^  WM"^/ 

Proof: 

(1)  Let  M  £  C  and  ic  [c]/.  Define 

IP)  =  {ii  I  It  f=A/  -’[rjf) 

IPt  =  {"  I  It  t=A/  V} 

Then,  by  assumption  and  definition, 

inf  { 1  —  /?( IC.  It))  >  c 


320 


and 

Vii’'  e  IV,,  inf  (1  -  R{w'.u))  <  c. 

ueWi 

This  implies  supugiv'^  R(u\u)  <  1  —  e,  and  snp„g,v^  Riiv'.u)  >  1  —  r.Vu-  € 
HV  Then,  hy  transitivity  of  R 

R(iv.  u)  >  min(  /?(«•,  tr'),  u)),  Vir'.  ?/  £  U  . 


Thus, 


sup  R(u>.u)>  sup  inin( /?(«>,  u’'), /?(«'',  »)) 

uetVj  ueWj 

=  min( /?(»',  (/■')•  sup  R(u  '.ii)) 

u€H,. 


Now, 

miu( /?(«'.  ic^),  sup  R(tt'.  It))  <  sup  R{ii  .ii)<  1  -  <■ 

u€Hj 

so  R(u\w')  <  1  -r.Vtt’'  £  ITj, since  supu^iv., /?(!<'.  »)  >  1  -c.Vir'  £  ll,. 
Tliis  result.?  in  inf,,  <€,v,  ( 1  —  /?(»'.  tr'))  >  c-  '■  *"•  N.'/  V]V]f  ■ 

(2)  Assume  w  [o]"*^/-  we  redefine  U  ,  =  {it  |  u  |=a/  "’[<’]^/}-  Tlien.  following 
an  analogous  argument  as  above,  we  have 

<  1  -  r,Vir'  €  U', . 

and  this  results  in  inf„,'g,v,(l  —  /?(«',«’'))  >  c.  That  is.  ir  (=aj 


□ 

Lemma  2.  IfC  ts  a  class  of  finite  transitive  models,  then  (=(3  [c]  +  /  D  [(■]'''[c]''‘/. 

Proof:  In  the  proof  of  LemiTia  1(2),  we  can  obtain  inf„,'giv,  (1  -  ff(w’.  »  '))  >  o 
at  the  last  step,  since  when  IV  is  finite,  the  function  ''inf  can  be  replaced  b\ 

'  min”  .□ 

Let  D4  denote  the  class  of  all  finite  serial  and  transitive  models  for  QML. 
then  we  have  the  following  theorem. 

Theorems  (Soundness).  Let  S  be  a  set  of  wffs  and  f  he  a  irff.  If  f  ts  deniable 
from  S  in  system  Df  then  S  |=£)4  /• 

Proof:  The  preceding  lemmas  shows  that  axioms  .d.\  d"’ .  1  and  .  lA  1'  .2  are  valid 
in  D4.  Combining  this  with  the  .soundness  theorem  of  the  system  1)  provi*  the 
required  result. 


321 


4.2  Completeness 

We  adopt  a  Henkin-style  construction  to  prove  the  completeness.  This  is  esen- 
tially  analogous  to  the  method  used  by  Fitting[2]  in  proving  the  completeness 
for  ordinary  modal  logic.  The  critical  difference  is  induced  by  the  fuzziness  of 
accessibility  relation. 

To  simplify  the  preesentation,  we  use  the  uniform  notations  for  modal  logic. 
The  uniform  notations  are  based  on  a  cla.ssification  of  the  non-literal  wffs  into  six 
categories  according  to  the  formula's  main  connective  which  is  used  to  combine 
its  immediate  subformulas.  We  list  the  classification  in  Table  1-4.  We  will  abuse 
the  notation  and  use  q.oj,  •  ■  ■,  etc,  to  deonte  wffs  of  the  respective  types  and 
their  immediate  subformulae 


Table  1.  o  wffs  aii<l  their  coiiipoiient  fonnulas 


a 

0|  02 

/  A  .</ 

/  9 

-'(/Vs) 

-’/  -'</ 

-(/  3<7)| 

/  ~'y 

I 

f  / 

Table  2.  iS  wffs  aiul  their  component  fornialas 


d 

l)l  1^2 

f  9 

-(/As) 

-/  -9 

f^y 

-/  9 

Table  3.  i/  and  i/"*"  wffs  (with  parameter  c)  and  their  component  formidas 


i/(c) 

t'ofc) 

t/+(c) 

w/ 

/ 

wv 

/ 

-/ 

-(1  -c)/ 

-/ 

The  completene.ss  theorem  is  ba.sed  on  a  more  general  theorem,  called  model 
existence  theorem.  To  state  and  prove  the  model  existence  theorem,  we  need 
some  further  definitions. 


322 


Table  4.  ir  and  jt'*'  wffs  (with  parameter  r)  and  their  component  formulas 


xte) 

xo(c)  X+(c) 

ITo  ('■) 

ic)f 

/ 

/ 

-•[1  -  c]  +  / 

Definition  4.  Let  5  be  a  set  of  wffs.  Then 

1.  Define  the  positive  subformiilas  of  5,  denoted  by  .S"*'.  as  the  smallest  set 
containing  5  and  being  closed  under  the  following  conditions: 

(a)  if  -i/,  (c)/,  (c)'*'/,  [r]/,  or  [c]+/  G  S'*' ,  then  /  £  .S’"*' . 

(b)  if  /  A  g,/  Vji,  /  D  3  €  5+.  then  /,  <?  £  5+. 

2.  Define  the  negation  subformiilas  of  S.  denoted  by  S^  ,  as  the  following  set: 

3.  Define  Siib(S)  =  S*  U5“. 

4.  Define  the  parameter  value  set  of  5.  1(5)  as  follows: 

\'{S)  =  {c,  1  —  c  1  3/  such  that  [c]/.  {c)f.  [c]'* f.or{c)'*'  f  £  Sub(S)}  U  {().  1 }. 

Also.  Sttb(f)  =  Siib({f})  and  !’(/)  =  r({/))  if  /  is  a  single  wff.  .Note  that  if 
5  is  finite,  then  Siib(S)  and  V'{.5)  are.  too.  In  what  follows,  let  F  =  Sub((!)  for 
some  finite  set  G. 

Definition 5.  Let  0  C  2^  be  a  collection  of  subsets  of  F.  Then  (-)  is  calh'd  a 
classical  consistency  property  on  F  iff  for  all  .S’  £  0.S  satisfies  thi>  following 
three  conditions: 

(a)  o  £  S  =>  5  U  {01,02}  G  ©, 

(b)  5  £  5  =>  5  U  {5i }  £  0,  or  5  U  {ih}  G  0,  and 

(c)  for  any  atomic  wff  p,  .S'  does  not  contain  p  and  -■p  simultaneously,  and  none 

of  the  following  wffs  are  in  .S'  :  1 ),  7r+(  1 ),  -iT,  T, 

Definition 6.  Define  the  world-alternative  function  H  and  strict  world-alternative 
function  H'*  on  F  as  follows. 

:2^  X  \  (F)  —  2'", 

H(S.c]  =  {i/o(c').  i^(c')  I  c'  >  r.iy{c')  £  .S'}  U  I  £  ,S') 

and 

//+{,?,£■)  =  i4c'}  I  (■'  >  r,  i/(r')  £  .S}U{/-',|(r').  t^'^(c')  |  c'  >  c.//+(r')  £  ,S'}. 

Definition?.  Let  0  C  2^  be  a  collection  of  subsets  of  F.  Then  0  is  a  D4- 
consistency  property  iff  it  is  a  cla.ssical  consistency  property  and  for  all  .s'  £  0,.S 
satisfies  the  following  three  additional  conditions: 

(hi)  f  >  0  and  7r(c)  £  .S'  =>  //■*■(, S’.  1  -  r)  U  {a-n(r)}  £  0,  and 


(h‘2)  7r'''(f)  e  S  =>  H{S.  1  -  <■)  U  (7r,|(r)}  £  0. 

(d-l)  //  +  (S,0)  £  0. 

Let  B  C  F,  then  a  D4-consistency  property  0  is  called  B-compalihle  iff  for  all 
5  £  0,  and  /  £  fl,  SU{/}  €  6>. 

Depending  on  these  definitions,  we  can  state  and  prove  the  main  theorem. 

Theorems  (Model  Existence  theorem).  If  F  is  a  fintit  sd  of  uffs.  BCF. 
and  0  IS  a  B-compalible  D^-consistency  property  on  F.  then  there  exists  a  finite 
serial  and  transitive  model  A/  =  {\V,  R,TA)  such  that  t=A/  B  and  for  all  f  £ 

S  £  0,  there  exists  a  world  w  £  IT  such  that  w  ^a/  /• 

proof:  To  prove  this  theorem,  we  nse  a  constructive  argument.  First,  noting 
that  0  is  ordered  by  C  and  all  elements  of  0  is  finite  sets,  we  ran  const riirl  the 
model  A/®  =  {W.  R,TA)  as  follows: 

U':  the  set  of  ma.ximal  members  in  0, 

TA(S.  p)  =  1  p  £  S.  V5  £  ir  and  p  £  P\  .  and 
R  must  satisfy  that  for  all  5.  S'  £  IF,  and  r  £ 

(rl)  if  c  ^  0,  then  H(S.c)  C  S'  <=>  R(S,S')  >  1  —  c 
(r2)  //  +  (5.r)  C  S'  <!=>  R(S.S')  >  \  -  c. 

Then,  we  use  a  series  of  lemmas  to  show  that  A/,3  i.s  weJJ-defitttMl  and  satisfies 
the  requirement  of  the  theoretn. 

Lemma  9.  There  exists  a  transitive  R  satisfying  (rl)  and  (r‘2). 

proof:  According  to  Definition  G.  we  have  the  following  results  for  all  .S’  £  2^ 
and  c.  c'  £ 

(i)  H  +  (S,c)  C  //(.S,c), 

(ii)  if  c  >  c',  then  H(S.c)  C  H{s.c'). 

(iii)  if  c  >  c' ,  then  H'^(S,c)  C  H'*'{S,c'),  and 

(iv)  if  c  >  c' ,  then  H{S,c)  C  H'*'{S,c') 

(v)  for  all  Si,S-,,S3  £  2^.  if  //  +  (.Si.c)  C  S2  and  H  +  (S-..c)  C  .Sg,  then 
H  +  (Si,c)  C  H  +  {H+(Si.c),c)  C  //+(52,c)  C  53. 

(vi)  for  all  .Si ,  S2,  S3  £  2^,  if  //(.S’l ,  c)  C  S3  and  HiS-j.c)  C  S3,  then  H{S\  .c]C 
S3- 

Let  V’(  F)  =  {ci ,  C3,  C3,  •  ■  ■ ,  e„.  }  such  that  1  =  ci  >  C2  >  fa  >  ■  >  f-i  =  0 

Then,  we  have  the  following  inclusioti  chain  for  any  S  £  2^  : 

0  =  H  +  (S.Ci)C  //(S.ci)  C  ■■  ■  C  //  +  (.S.c,)  C  H(S.c,)  C  /f+(,S,  c,  +  i )  C  ■  C  //  +  (: 

Thus,  given  .S  and  .S',  there  are  three  possible  cases: 

Case  1:  for  some  i  such  that  1  <  /  <  n—  1,  //’•‘(.S.r;)  C  .S’'  and  //(.S.r,)  <f.  S'.  In 
this  case,  the  only  value  of  R(,S,S')  which  satisfies  the  conditions  (rl)  and 
(r2)  is  1  —  e,. 

Case  2:  for  some  i  such  that  1  <  i  <  n  —  \  .  H(S.c,)  Q  S'  and  //‘•'(.S,  r,  +  i )  ^  S' . 

In  this  case,  if  1— c,  <  /?(.S,.S')  <  1— r,  +  i  then  (rl)  an<l  (r2)can  be  satisfied. 
Thus,  we  can  set  R{S.S')  =  1 - 


324 


Case  3:  if  fl'^{S.c„)  C  S',  then  R{S..S')  =  I 

Then,  according  to  (v)  and  (vi)  above,  we  can  .see  that  for  all  5i .  S-j.  S  €  U  .  we 
have  R(Si ,  S3)  >  min(  R(Si ,  5i.),  RiSn.  53)),  i  e..  /?  is  a  transitive  fuzzy  relation. 
□ 

Lemma  10.  ts  a  sertal  model. 

proof:  Since  for  any  S  in  U',  H'*'{.S.O)  €  0  hy  condition  (d).  and  each  element 
of  0  is  a  finite  set.  we  can  find  the  maximal  extension  of  //'•'{.S.  0).  say  S' .  in  0. 
Then  S'  €  U’,  and  //"'■(.S.O)  C  S' .  so  R(S.S')  =  1  and  supq,gn.  R{S..S')  =  1.  □ 

Lemma  11.  For  any  S  €  and  n  ff  f  €  S’,  .s’  /. 

proof:  By  induction  on  the  structure  of  /.  For  convenience,  we  drop  the  subscript 
Ms  in  the  following  proof,  and  write  S  f  only.  W’e  consider  the  following 
exhaustive  cases. 

Case  1:  /  is  an  atomic  wff.  By  the  definition  of  Ms.7  -HS.  f)  =  ].  since  /  6  S. 
Thus,  S  \=  f  according  to  the  po.ssible  world  semantics. 

Case  2:  /  =  -•p  for  some  atomic  wff  p.  By  condition  (c)  of  I)('finition  •'),  -'p  €  S 
implies  p  ^  S.  and  so  TA(.S.p)  =  0.  Tins  in  turn  implies  .S'  (=  /. 

Ca.se  3:  /  is  an  o  wff.  Then  SU  {oi  .ovti  €  0.  anti  siitce  .S  is  a  maximal  member 
of  0.  this  means  oj.  €  .S,  By  induction  hypothesis,  .S’  ^  O)  and  .S  ^  o-.<.  so 
f. 

Ca.se  4:  /  is  a  wff.  Then  .S U  {.:^i }  €  ^9  or  S’  U  G  0.  so  S’  ^  or  S’  [=  -ij 
by  the  maximality  of  5.  Thus.  S  \=  f. 

Case  is  a  rrfc)  wff  with  c  >  0.  (note  that  if  r  =  O.S  |=  TT{r)  naturallv).  Then 
//+(.?,  1  -  e)  U  {ttoIc)}  G  0.  so  let  .S'  b<'  the  maximal  extension  of  //‘•‘(.s’.  1  — 
c)  U  {ttoIc)}  in  0.  we  have  R{S..S')  >  c  since  //  +  (,S’.  1  —  r)  C  ,S'.  Furthermore. 
7ro(c)  G  S',  so  .S'  1=  rrn(r)  by  induction  hypothesis.  Thus  .  .s'  |=  /.  by  the 
definition  of  /TqIc)  wffs  and  their  semantics. 

Case  6:/  is  a  7r‘''(c)  wff  with  c  <  1.  (Note  that  if  c  =  1.  then  ^■*'(1)  ^  ,S).  This 
case  is  similar  to  case  5. 

Ca.se  7:  /  is  a  u{r)  wff.  Let  .S'  be  a  element  of  IF.  If  S'  f=  -i/Aife),  then  /zi)(r)  0  .S' 
by  induction  hypothesis.  This  implies  H{S  r)  2  S'  since  i/o(r)  G  //(.S’,  r)  by 
definition,  and  in  turn  1  —  R{.S,S')  >  r  by  the  definition  of  .Ms-  Ihus  we  have 
inf{l  -  R(S,S')  I  S'  1=  -tzo(c).S'  G  IF}  >  e.  i.  e.  S  t=  f- 

Case  8:/  is  a  W^lc)  wff.  The  proof  is  similar  to  that  of  Ca.se  7,  but  u.se  the 
property  of  IF  being  finite  in  the  la.st  infernce  step.D 

Note  that  our  induction  basis  are  literals  (i.  e.  Case  1  and  2),  not  just  atomic 
formulas,  so  we  ran  infer  the  results  in  all  other  ca.ses. 

Now,  we  can  complete  the  proof  of  the  model  existence  theorem.  First,  since 
0  is  //-compatible,  we  have  DCS  for  any  S  G  IF,  by  the  maximality  of  ,S. 
Thus  for  all  S  G  IF,  .S  /  if  /  £  //  by  Lemma  11.  That  is,  t=.ttR  //  Mov('over. 
since  for  all  /  G  S  G  0.  there  exists  a  maximal  extension  of  ,S,  say  ,S’',  in  ll  . 
so  S'  t=  /  by  Lemma  11.  Finally,  IF  is  finite  and  Ms  is  transitive  and  si'iial  by 
Lemma  9  and  10.0 

We  can  now  consider  the  completeness  of  the  />!  system. 


Theorem  12.  Let  B  be  a  finite  set  of  wffs  and  f  be  a  u  ff.  Then  B  ^1)4  /  iniplies 
that  f  IS  derivable  from  B  in  D^. 

Proof:  Let  F  —  Stib(B  U  {/}).  Define  0  =  {.‘5  C  F  |  W  ~'/\^]-  "here 
“/\5”  denote  the  conjunction  of  all  wffs  in  i'  and  B  t/p4  f  means  that  /  is 
not  derivable  from  B  in  Di.  It  can  be  sliown  that  0  is  a  /?-compatihl(>  D4- 
consistency  property  on  F.  Thus  if  B  l/t  /,  then  {-■/)  €  0.  and  by  the  model 
existence  theorem,  we  have  B  ^^134  /•  D 

5  Concluding  Remarks 

We  have  provided  a  system  with  built-in  axioms  for  higher  order  uncertainty 
rea.soning.  However,  we  do  not  claim  that  Di  is  the  unique  possibile  system  for 
this  purpose.  In  fact,  if  more  constraints  are  imposed  on  the  possible  moilels,  then 
some  additional  axioms  can  be  derived.  For  example,  if  we  require  that  the  fuzzy 
accessibility  relation  is  a  similarity  relation  as  in  [•')].  then  two  additional  axioms. 
[O]'*’/  D  /  (corresponding  to  reflexivity)  and  /  D  [1  —  c](c)'*'/  (corresponding  to 
symmetry)  could  be  added  to  DA. 


References 

1.  D.  Dubois  and  H.  Prade.  "An  introduction  to  possibilistic  and  fuzzy  logics.  "  In: 
Non-standard  Logics  for  Automated  Reasoning  (P.  Smets  et  al.  eds.  ).  2X7-12.5. 
Academic  Pre.ss,  1988, 

2.  M.  C.  Fitting.  Proof  Methods  for  Modal  and  Inltiilionisltc  Logics  .  \bl  I()9  of 
Synthese  Library,  D.  Reidel  Publishing  Company.  lOX-l. 

3.  S.  A.  Kripke.  “  Semantical  considerations  on  modal  logic."  Acta  Philosophica  Fen- 
nica.  16:8.3-94,  1963. 

4.  C.  J.  Liau  and  1.  P.  Lin.  "  Quantitative  modal  logic  and  possibilistic  reasoning", 
Proc.  of  ECAI92,  43-47,  1992. 

5.  E.  H.  Ruspini.  "  On  the  Semantics  of  Fuzzy  Logic."  International  Journal  of  Ap¬ 
proximate  Reasoning.  5:45-88,  1991. 


Approximation  Methods  for  Knowledge 
Representation  Systems 


Cecyiia  M.  llauszer 

Institule  of  Mathematics,  University  of  Warsaw 
02-097  Warsaw.  Baiiaclia  2.  Poland 
e-mail;  rauszeri^niiniuw. edu.pl 


Abstract.  In  the  paper  we  exarnin  an  approximation  logic  which  may 
serve  as  a  tool  in  investigations  of  distributive  knowledge  representa¬ 
tion  systems.  The  concept  of  knowledge  is  based  on  rough  set  approach. 
The  logic  under  consideration  is  strong  enough  to  capture  properties  of 
knowledge  understood  as  partitions  of  the  universe,  lower  and  upper  ap¬ 
proximations  of  sets  of  objects  as  well  as  properties  of  dependencies  of 
attributes. 


1  Introduction 

Reasoning  about  knowledge  has  long  been  aii  issue  of  concern  in  philosoidiy  and 
artificial  intelligence.  A  number  of  papers  addressing  the  problems  e.g.  [I]  [2], 
[4],  [5]  have  confirmed  the  important  role  of  this  domain. 

One  of  the  difficulties  which  occur  when  we  want  to  formally  reason  about 
knowledge  is  lack  of  an  agreement  vvhat  we  mean  by  knowlegde  or  what  proper¬ 
ties  knowledge  satisfies  or  should  satisfy.  This  problem  is  particulary  important 
when  we  want  to  find  out  good  semantics  which  allows  us  to  reason  about  knowl¬ 
edge. 

In  this  paper,  we  focus  on  rough-sel  apiuoach  ([G],  [7])  to  modeling  knowl¬ 
edge.  In  our  approach  knowledge  is  understood  as  the  ability  of  cla.ssification  of 
objects.  By  objects  we  mean  anything  what  a  human  being  can  think  of.  Objects 
create  the  universe  of  discourse  U ,  called  sliortly  universe.  Any  partition  of  U 
is  said  to  be  knowledge  about  the  universe  or  simply  knowledge.  For  instance, 
if  C/  =;  {01,02,03,04,05}  then  the  partitions  £,  =  {{oi,  02, 03},  {04},  {05}}  and 
£.^  =  {{oi,  02},  {03, 04},  {05}}  are  examples  of  knowledge  about  the  universe  U . 

To  give  some  intuitions  of  the  problems  discussed  in  the  paper  it  is  convenient 
to  assume  that  each  partition  is  viewed  as  a  knowledge  of  agent  t  about  the 
universe  U . 

One  of  the  problems  we  are  interested  in  is  how  to  represent  knowledge. 
Suppose  that  U  is  the  universe  of  discourse  and  let  S-t  denote  any  knowledge 
about  U ■  What  we  want  to  find  out  is  a  "real’'  description  of  objects  which 
classifies  them  in  the  same  way  as  E, . 

For  this  purpose,  we  use  so  called  information  sysU'iii  [7].  An  information 
system  in  our  approach  is  in  fact  a  data  table  with  coltmins  labelled  by  attributes 
and  rows  labelled  by  objects.  Each  row  in  the  data  table  repre.sents  information 


327 


about  the  corresponding  object.  In  general  in  a  given  information  system  we 
are  not  able  to  distinguish  all  single  objects  (using  attributes  of  the  system). 
Namely,  objects  can  have  the  same  values  on  some  attributes  or  all  attributes. 
As  a  consequence,  any  set  of  attributes  divides  the  universe  U  on  some  classes 
which  establish  a  partition  of  the  set  of  all  objects  U .  Suppose  now,  that  St  is  an 
information  system  describing  the  universe  U  by  means  of  a  set  of  attributes  At  ■ 
Denote  it  by  {U,  At),  that  is,  Nt  =;  {U,  At)  and  let  S  be  the  partition  given  by  A,. 
If  S  —  St  then  we  say  that  St  represents  knowledge  St  and  call  St  a  know  ledge 
representation  system  of  St-  Clearly,  a  .such  repre.senlation  is  not  unique. 

Now,  let  S  =  a  family  of  partitions  of  I' .  that  is,  each  St  is 

considered  as  knowledge  about  U .  Let  {<S(}t£r  he  the  family  of  knowledge  rep¬ 
resentation  systems  for  S,  that  is,  for  every  I,  St  is  a  knowledge  representation 
system  of  St-  We  will  call  the  family  {5t),gT  disiributtve  knowledge  repre-'ienia- 
tion  systems  of  S-  Notice,  that  each  St  may  be  viewed  as  a  perception  of  the 
universe  U  by  agent  t- 

Now,  let  S,  and  St  be  two  different  classifications  of  objects  from  1.'  -  We 
define  so  called  strong  common  knowledge  of  St  and  St  •  denoted  by  C,,vt  •  ami  weak 
common  knowledge,  denoleil  by  S.,ai  of  i’,  and  S-t-  Intuitively  siieaking,  strong 
common  knowdedge  is  a  jiartilion  of  tin'  universe  such  that  a  better  recognition 
of  a  set  of  objects  by  one  classification  is  preserved.  In  other  words,  better 
knowledge  about  objects  is  reflected  in  S,vi-  if  interpreted  as  knowledge  of 
an  agent  s  and  St  as  knowledge  of  an  agent  i  about  the  same  universe  U  then 
S,\/t  may  be  viewed  as  knowledge  of  a  the  group  {s,  f},  denoted  by  si,  which 
intuitively  may  be  described  as  follows:  If  an  agent  s  knows  that  an  object  x  is 
different  from  objects  Xj^,. . Xj^  then  x  has  to  be  different  from  those  objects 
in  S,\/t-  If  for  an  agent  t,  according  to  her  knowledge  about  the  universe  U, 
the  same  object  x  differs  from  xi,, . .  then  the  objects  x  is  perceived  by 

group  st  as  the  object  which  is  different  from  Xj, ,...,  x^,,, ,  x; x,,, .  As  a 
consequence  we  have:  if  an  object  x  is  distinguished  from  y  in  S,\/t,  that  is,  the 
block  [x]  is  different  from  the  block  [»/]  (in  ^.,v/)  then  it  means  that  at  least  one 
of  agents  s,t  distinguishes  x  from  y-  If  objects  x  ami  y  are  indiscernible  in  ^,vt  . 
then  it  means  that  each  agent  from  the  group  st.  according  to  her  knowledge,  is 
not  able  to  distinguish  between  x  ami  y.  For  instance,  in  t  he  above  example  the 
partition  S,  better  classifies  oliject  o-t  than  knowledge  S-t-  Using  knowledge  c‘r 
objects  03  and  04  are  indiscernible.  Hence,  the  class  {o^j  has  to  appear  in  S^vt- 

The  notion  of  weak  common  knowledge  may  be  interpredted  as  knowledge 
where  weaker  knowledge  about  objects  dominates  in  Ss/^t-  in  other  words,  if 
an  agent  s  identifies  x  with  y,  that  is,  {x.y}  is  a  block  in  Ss  then  the  group 
st  also  identifies  x  with  y.  If  moreover  t  identifies  y  with  z  and  distinguishes 
between  x  and  y,  that  is,  {.V.  ;}  €  St.  then  it  means  that  the  block  {x,y,^} 
belongs  to  Tiai-  As  a  consequence  we  have  that  if  [x]  =  {x}  E  then  it 

means  that  there  is  consensus  of  agents  .s-  and  i  about  object  x,  i.e,,  [x]  G  St  ni’;. 
If  {xi, .  .  .,x„}  G  SsM,  then  it  means  that  either  there  is  consensus  of  agents  s 
and  t  about  Xi, . .  . ,  x„  or  for  any  sub.set  of  {xi, . .  .,x„}  there  is  no  consensus. 
For  instance,  because  of  the  objects  03  and  04  are  indiscernible  by  St  they  have 


328 


to  remain  indiscernible  in  Moreover,  oj  is  not  distinguished  from  objects 

0i,02  by  Thus  the  block  03,04}  is  an  element  of  the  partition 

We  have  shown  that  the  family  S’  =  {i'lltg-/-  may  lie  considered  as  a  lattice 
Moreover,  S,vt  is  the  infimum  of  i,',  and  in  £  and  Ss/^t  is  the  supremum  of  f, 
and  in  S. 

The  operation  of  weak  common  knowledge  seems  to  be  more  interesting 
than  the  operation  of  strong  common  knowledge.  Namely,  vve  show  that  if 
=  (C/,Aj)  is  a  knowledge  representation  system  of  S,  and 
is  a  '  nowledge  representation  system  of  m,  then  the  system  (U,A,  U  A,)  rep¬ 
resents  strong  common  knowledge  of  the  group  st  i.e,,  ^',vi  =  .  In  the 

case  of  weak  common  knowledge  .ve  have  ^  nA,  is  a 

weak  common  knowledge  of  fg  and  S',  then  there  is  a  knowledge  representation 
system  {U,A)  such  that  the  partition  determin<Hl  liy  A  equals  to  S', a,.  llo.vevt‘r. 
the  set  of  attributes  A  need  not  to  be  the  sann’  ;is  tlie  set  A,  O  A, 

It  is  shown  also  that  the  family  {Ntjigy  of  distributed  knowh'dge  represen¬ 
tation  systems  may  be  treated  as  a  lattice.  'I'he  intuitive  meaning  of  the  lattice 
ordering  <  is  that,  if  <  S  then  the  sharinie.ss  of  percejition  of  I’  and  therefore 
feature  recognitions  of  objects  from  the  universe  IJ  by  agent  s  is  weaker  than 
that  of  agent  t. 

In  the  paper  we  examin  lower  and  upper  approximations  of  sets  of  objects 
[8]  in  distributive  knowledge  systems.  Intuitively  speaking,  by  a  lower  approxi¬ 
mation  oi  X  in  S  =  (U ,  A)  we  mean  the  set  of  objects  of  U  which  without  any 
doubt  belong  to  X .  An  upper  approximation  of  X  is  a  set  of  objects  which  could 
be  classified  as  elements  of  A'.  Finally,  we  consider  boundary  of  A'  which  is  in  a 
sense  undecidable  area  of  the  universe. 

Finally,  we  introduce  and  examin  formal  system  called  approximation  logic. 
This  logic  is  intended  as  a  logic  which  reflects  properties  of  knowledge  (in 
our  sense),  properties  of  approximations  and  some  other  features  of  distribu¬ 
tive  knowledge  representation  systems.  The  idea  of  this  logic,  is  based  on  [Ra]. 
Roughly  speaking,  our  logic  is  a  modal  logic,  with  finite  number  of  modal  op¬ 
erators.  Each  modal  operator  corresponds  to  knowledge  represented  by  an  in¬ 
formation  system.  A  formula  of  the  form  may  Ire  interpreted  as  a  lower 

approximation  of  the  set  of  objects  which  sat  isfy  0  in  the  information  system 

5=  (17, A). 

2  Knowledge  Base 

In  this  paper,  knowledge  is  understood  as  the  ability  to  classify  olrjects.  By 
objects  we  mean  anything  a  human  being  can  think  of,  as  for  example,  abstract 
concepts,  real  things,  processes,  states,  etc.  Objects  are  treated  as  elements  of 
real  or  abstract  world  called  the  universe  of  discourse  or  shortly  the  universe. 
Hence,  in  our  approach  knowledge  is  strictly  connected  with  classification  of 
parts  of  the  universe. 

We  explain  this  idea  more  precisely. 


329 


Let  U  be  the  universe  of  discourse.  Any  subset  X  of  U  is  said  to  be  a  concept. 
Any  family  of  concepts  which  forms  a  partition  of  U  will  be  referred  to  us  as 
knowledge  about  t/,  or  shortly  knowledge. 

Let  R  be  an  equivalence  relation  over  U.  By  we  denote  the  family  of 
all  equivalence  clcusses  (blocks)  of  R.  'Thus,  in  our  terminology  t'n  is  knowledge 
about  Li  and  each  block  is  considered  as  a  concept.  So,  if  R  is  an  equivalence 
relation  then  R  determines  a  classification  of  objects  from  the  universe  L' .  If 
a  block  {^]ft  G  Sf{  contains  more  than  one  object  then  objects  from  [a;]/?  are  not 
distinguishable  with  respect  to  knowlegde 

Let  TZ  =  where  T  is  a  finite  set,  be  a  family  of  equivalence  relations 

over  U.  By  knowledge  base  we  moan  any  relational  system  fC  =  (I/, 

Each  Rt,  where  t  E.T,  determines  knowledge  Sn,  about  U  and  each  block  [a"]/?, 
from  Sfi,  is  a  concept  called  a  basic  concept.  To  simplify  notation  we  will  write 
£,  instead  of  £u,  and  [x],  instead  of  for  any  I  €  T. 

Let  (^^  ( Wijig?  )  be  a  knowledge  base  such  that  {/?(}ie7  is  the  family  of 
all  equivalence  relations  on  /’  and  let  £  =  lie  a  family  of  all  partitions 

determined  by  {/f,  },£/  .  Let  ^  be  a  btnar\  rehttion  on  £  defined  as  followi-: 

£,  <  £t  if  and  only  if  Vjj  JjBji/],  [x].,  C  [i/j,. 

It  might  be  proved  that  is  an  orilering  relation  on  £,  that  is,  (^^-<)  is  a 
posel. 

Denote  by  £,  A  St  the  following  set: 

n  [«/],  :  [x],  n[y](  4:  0}. 

It  might  be  easily  proved  that  At'i  is  a  partition  of  Lt  and  moreover,  Sg  r\£.i  i.s 
the  infinium  (tnj)  of  £.,  and  £t  in  (^^  -<),  that  is,  for  every  s.  t  G  T.  the  inf{£.s ,  t't } 
exists  in  the  poset  (^^,-<).  ^\’e  call  i’,  /\£i  a  strung  common  knowledge  of  £.,  and 

Now  put 


£,\/  £t  =  {X  C  U  :  X  is  the  union  of  all  [x],  and  [y](  such  that  [x]j,n[y](  0}. 

It  is  easy  to  show  that  £.,V£t  is  a  partition  of  if .  Moreover,  £,'^£1  is  the  suprcmum 
(sup)  in  {£,  -<)  of  £s  and  £,.  Thus  for  any  £,  and  £t  in  £,  the  sup{£,,£i}  exists 
in  the  poset  (^,-<).  We  call  the  partition  £,  V  £t  a  weak  common  knowledge  of 
£,  and  £t. 

Loiiiiiia  2.1  (^^V,A,0.1)  IS  a  lathci  with  tin  .'<10  dement  and  the  unit  eU- 
ment. 


□ 

We  call  every  sublattice  of  the  lattice  (£,  V,  A,0,  1)  a  lattice  of  partitions.  One 
can  show  that  a  lattice  of  partition  need  not  to  be  distributive. 


330 


We  finish  this  section  with  the  follow  iiif:;  remark  l.el  t'  =  /  he  a 

knowledge  base  over  U .  It  might  he  easily  shown  that  the  relation  <  delined  on 
the  set  of  indices  T  as  follows: 

s  <  <  if  ami  only  if  i’,  -<  £i. 

is  an  ordering  relation  T.  Tlius  [T,  <)  is  a  iiosei.  Moreover,  if  i'  is  a  lattice  of 
partitions,  that  is,  £  is  closed  under  the  binary  operations  y  and  A  delined  above 
and  inf£  and  supS  belong  to  £  then  one  can  prove  that  sup{s,t)  denoted  s  V  t 
and  inf{s,t}  denoted  s  A  t  exist  in  (T,  <).  Namely,  we  have 

s  V  t  =  stip{s,t}  if  and  only  if  =  £,  A  t', 

and 

s  A  t  =  in /{s,  I }  if  and  only  if  £^^1  =  £f  V  £i 

3  Lower  and  Upper  Approximation 

Observe  that  some  concepts  may  be  expressed  as  llie  s<>t-tlieuretical  union  of 
certain  basic  concepts  from  one  knowdetige  about  U  btit  they  cannot  be  delined 
as  the  union  of  basic  blocks  from  anot  her  knowledge.  Hence,  if  a  concept  cannot 
be  covered  by  basic  concepts  from  a  given  knowledge  base  £i  then  the  ((iiestion 
arises  whether  it  can  be  ’’approximately'’  defined  by  £i.  In  this  section  we  are 
going  to  discussed  this  problem. 

Let  R  be  any  ecpiivalence  relation  over  With  <.w(-ry  concept  .V.  .V  C  (', 
we  associate  three  sets:  R{X).  R(X)  and  liii(X)  ctdled  R-low(r  appivnniution 
of  X ,  R-upper  approximation  of  X  and  R-boundanj  of  X  .  respectividy,  where 

fl(A)  =  {Wn:  WnCA-}, 

7?(A)={[x]r:  M„nA-?^0}, 

and  _ 

j3fi(A)  =  ft(A)-/?(A). 

Intuitively  speaking  the  lower  approximation  ol  A  is  the  collection  of  all 
elements  of  the  universe  which  can  be  cla.ssified  with  full  certainty,  as  elements 
of  A,  using  knowledge  R.  The  up|>er  approximation  of  A  is  the  collection  of 
objects  from  the  universe  U  which  can  b('  |)ossil)ly  cla.ssilied  a.s  eh'im  nts  of  A  , 
using  knowledge  R.  Finally,  the  boundary  of  A  is  in  a  .sen.s<'  nndecida'  h'  area  of 
the  universe,  that  is,  none  of  tin-  objects  ladonging  to  lii^(X)  can  lie  classdied 
w'ith  certainty  into  A'  or  —A’  as  far  a.s  knowli'dge  R  is  concerned. 

We  say  that  a  concept  A  is  R-definabU  if /f{A  )  =  R{X).  If  for  some  subset 
A  of  U ,  RiX)  ^  R(X)  then  A'  is  said  to  be  R-rouyli. 

From  now  on  we  will  assume  that  £  =  {b’ljtgr  is  a  lattice  of  partitions  and 
we  will  consider  T  also  as  a  lattice. 

Now,  let  72.  =  {7?t}i67'  die  family  of  ecinivalence  relations  over  U  deter¬ 
mined  by  {£t]tiT- 

Because  of  the  duality  between  lower  and  upper  approximatioa  we  list  only 
properties  of  lower  approximation. 


T 


331 


Leiulua  3.1  tor  tvtrtj  t  €  / 

t'l  -<  t',  it’  and  only  if  C  i?,(A ). 


□ 


Lemma  3.2  Lei  TZ  =  (f/,  {ftilier)  be  a  knowledge  base  such  that  {R,}ter  '■'« 
defined  by  for  every  equivalence  relations  R,  and  Rt  froniR-  and  every 

set  X  CU  the  following  hold: 

1.  R,(A')U/?,(A)  C 

2.  C  RfiX)URfiX) 

RiRfiX)CRfiX). 

J,.  i',  ■<  implies  RiJlfiX)  =  R,{X  ) 

□ 

Ne.xt  loiiiiiiacliarartf'ri^'eAi  tin-  boundary  of  .Y  dnic'iniined  by  any  eqin valojuc 
relation  from  7?  . 

Lemma  3.3  Let  (U,{Rt}tir)  bf  «  knowledge  base.  For  every  A  C  b  the  fol¬ 
lowing  conditions  are  true: 

J.  S-i  -<  F,  if  nnd  only  if  Ut{R  )  Q  Rf{.\  )• 

2.  B,^t(X)  c  BAX)n  Bt(X). 

3.  BAX)^Bt{X)  C  At,A((A), 

1  B,(A')  =  B,(-A). 

5.  R,BAX)  =  BfiX). 

6.  Bt(X)  =  9  if  and  only  if  RfiX)  =  A'. 

4  Information  Systems 

In  thi.s  Kcction  we  recall  notion  of  information  sy.stem.s  introduced  by  Pawlak  [8], 
An  information  systim  is  a  pair  =  (U.A),  where 

I’  -  a  nonemply,  tinit<'  si't  of  objects,  called  tin'  untrersc  oj  disiouise  of  i5. 

A  -  a  tinite  set  of  attnbuhs  i.e. 
n  ;  b  —  I  a  tor  ii  E  A. 
where  1,1  is  called  ihi  value  set  oj  a. 

'J  lic  .set  Y  =  U„eA  ■’’  ‘'o'"'’"" 

For  any  object  n  from  /  »  >—  (n,  r)  means  that  the  object  o  has  the  value 

V  for  the  attribute  a.  For  instance,  if  an  attribute  a  is  color,  its  value  v  is  green 
then  o  I—*  [color,  green)  means  that  object  o  is  green  in  .S. 

Instead  of  o  (a,  v)  we  will  also  write  nfo)  =  v. 

With  ■  very  subset  of  attributes  D  C  A  we  associate  a  binary  relation  indiB). 
called  iiutiscernibility  relation,  and  (h'fined  as  follows; 

tn(l{B)  =  {(j',  y)  El’  X  I’  '■  Vn  €  B  a(j.-)  =  «(;/)}  . 


I 

i 


332 


Notice  tliat  objects  j-,  y  satisl'y iiij^  tlie  rclaiioii  iiid(B)  arc  iiulisci  i  iiiblc  with 
respect  to  the  attributes  from  B.  lii  other  woixls,  I  lie  information  system  N  does 
not  distignish  x  from  y  in  terms  of  attributes  in  B.  by  (o]/y  we  diuiote  tin  eijuiv- 
alence  class  of  iiid{B)  including  the  obje'ct.  o.  i.e.  the  set  {;/  G  U  :  u  nid(B)  y] . 
Clearly,  for  every  B  C  A  the  family  of  all  ec|ui valence  classes  of  tin  relation 
ind{B)  is  a  partition  of  U .  Denote  this  partition  in  the  following  way; 

■ .  [0,1]^} 


Let  £  =  Clearly  £  is  a  lower  semi-hittice.  One  can  show  that 

£  =  liiltbe  of  partitions. 

Because  of  each  indiscernibility  relation  is  tin  eriiiivaleuce  relation  we  can 
express  notion  of  lower  and  upiier  ;i|)pro.\iiiiation  from  .Section  3  in  terms  of 
information  systems. 

Let  an  information  system  =  (/C  A)  be  given,  for  every  B  C  A.  hid{B)(.\) 
is  a  lower  approximation  of  A'  and  ind(B)(X)  is  nn  u|)per  apiiroximation  of  A' 
in  i5.  To  simplify  notation  we  will  denoti'  these  sets  by  B,(X)  and  /i(A’).  respec¬ 
tively  and  call  the  B-lower  approximation  and  the  W-upp<‘r  approximation  of  A’, 
respectively.  A  set  X  C  B  is  said  to  be  B-dijinahU  if  and  only  W  B_(X)  =  B{X) 
f.nd  A’  is  called  B-rouqh  ifi^(.V)  ^  B{.\  ). 

5  Knowledge  Representation  Systems 

It  turn.s  out  that  every  knowh'dge  base  may  be  repri.-.si'nti'd  as  an  information 
system,  that  is,  in  the  form  of  an  attribnti'-valu''  table. 

Let  £t  be  a  knowledge  about  V -  It  is  not  difiicnlt  to  construct  an  information 
system  St  —  {U ,  At)  such  that  £t  =  ‘'a.,'  information  system  l>(  = 

(U,Ai)  a  knowledge  representation  systeni  (k.r.s.)  for  £t  provided  £  =  6’^^  .  If 
fC  =  (U ,  {/?i}i67’)  is  a  knowledge  base,  then  the  family  =  {>-‘>(](g7-  is  said  to  be 
a  distributive  knowledge  representation  systcin  of  A.'  providi'd  for  every  t  G  7  ,  i!? 
is  a  k.r.s.  of  /?(. 

Observe,  that  if  5  =  {'St}i^r  is  a  distributive  knowledge'  representation  sys¬ 
tem  of  AC  =  {U ,  {/JiligT")  then  the  relation  <  dc'fined  in  Section  '2  may  be  viewed 
as  a  relation  between  knowledgi'  n'presentat ion  systems.  Namely,  the  intuitive 
meaning  of  the  relation  <  is,  that  it  s  <  t  than  the  featun'  n'cognitions  of  ob¬ 
jects  from  U  by  k.r.s.  —  [IJ ,  A.,)  is  weaki'r  than  that  of  Ni  =  (I  ’,  A,)-  In  other 
words,  the  cla.ssification  of  objects  from  t  he  universe  U  by  means  ot  t  he  set  of 
attributes  At  is  better  than  the  classification  of  these  objects  by  means  of  A,. 
If  s  <  <  then  we  say  that  k.r.s.  S.,  is  h'ss  ellicient  or  weaker  than  tin'  sy  ste'm  A'. 

Let  K  be  a  knowledge  base  and  let  =  {if>(}(g7'  be  a  distributive  knowledge 
representation  system  of  A'.  Notice  that  (t'.A),  where  A  =  Uig v die 
following  property:  for  every  /  G  /  there  is  a  set  B  C  A  such  that  ii>d(B)  =  H,. 
Indeed,  it  follows  from  the  properties  ol  indiscernibility  ri'lation  (see  Sc'ciion  4) 
thatifCi.,  =  (r,A,,)andN(  =  (/',  A,)  arc- knowledge  representation  systems  oi  t’, 
and  £t  respectively,  then  knowledge'  rl■presenl ation  system  ol  the  strong  common 


333 


knowledge  Csyt  is  the  infonnalion  syslein  of  tlie  form  (U  A*  U  A,).  Hence,  for 
every  knowledge  base  K  =  {U .  {Rt} t^r)  there  is  an  information  system  = 
(U ,  A)  such  that  for  every  t  G  7'  there  is  a  set  /7  C  A  for  wliich  ind(B)  =  /f,. 
We  call  also  such  a  system  a  knowledge  tepnsentulion  system  for  A.  . 

It  is  worth  while  to  observe  that  for  every  information  system  =  (/'.A) 
there  is  a  knowledge  l)ase  K  =  ((•  ,  {/(*,  }(£'/■ )  such  that  for  every  U  C  A  tbert'  is 
an  equivalence  relation  Ri  suclt  that  iii<l(B)  =  Rt.  Namely,  tala-  as  A'  the  system 

From  now  on  the  notion  of  knowledge  roi)resentation  system  will  be  used  as 
a  synonymous  with  information  system. 

6  Approximation  Logic 

6.1  Syntax 

In  this  section  we  are  going  to  didine  a  logic  which  may  be  used  as  a  logical 
tool  in  the  investigations  of  distributive  knowledge  representation  systems.  We 
want  to  construct  a  formal  system  which  allows  us  to  formalize  relations  between 
knowledge  representation  systems  and  which  will  describe  lower  and  upper  ap¬ 
proximation  of  any  concept  as  well  as  boundary  region  of  concepts. 

We  call  this  logic  approximation  logic  and  denote  .4-logic.  Let  us  emphesize 
that  A-logic  defined  here  is  not  related  to  approximation  logics  defined  and 
examined  in  [10]. 

Let  A  be  a  finite  set  of  attributes  and  let  V  =  (Jag  A  ^  family  of  finite 

sets.  Each  Va  may  be  treated  as  a  domain  of  an  attribute  a  £  A  . 

We  associate  with  .4  and  V  a  language  y  called  Ihe  formal  language  of  A- 
logic.  Ail  subsc'ts  of  A  and  ehmients  of  V  are  treated  as  constants  of  .  Sub¬ 
sets  of  A  are  called  ailnbute  conslanls  and  denoted  by  ci.  h,  r.  .  .  and  A,  B,(' . 

Elements  of  1'  are  called  alti  ibiile -raltK  (onslaiils  and  denoli'd  by  i\  a  .  .  .. 

The  language  consists  of  two  levefs.  The  expre.sslons  ol  these  levels  are 

called  formulai  of  the  I-st  kind  and  formulae  of  the  J-nd  kind  ,  respectively. 
Intuitively,  formulae  of  the  1-st  kind  describe  properties  of  knowledge  understood 
as  a  partition  of  the  univcr.se  I.' .  whereas  formulae  of  the  2-iid  kind  expre.ss  certain 
facts  concerning  sets  of  objects  of  f '  and  a|)pro.\imation  of  these  sets. 

To  give  a  formal  definition  of  the  sets  of  formulae  we  define  first  terms  of 

^Ay- 

Terms  are  built  up  from  attribute  constants,  two  constants  0  and  1  atui 
operations:  V  and  A.  More  preci.sely,  the  set  of  all  terms  is  defined  to  be  tin' 
least  set  T  with  the  following  three  properties; 

-  0  and  1  are  in  T. 

-  all  attribute  constants  arc  in  T, 

-  By  C,  B  AC  are  in  T,  whenever  B,C  are  terms. 

Formulae  of  the  1-st  kind  are  built  up  from  terms  and  two  predicates  <  and 
=  .  The  set  Fi  of  all  formulae  of  the  l-st  kind  is  the  smallest  .set  such  that 


334 


-  If  B,CeT  tlien  B  <C  e  /■',  and  B  =  Ce  /'d 

Intuitively,  formulae  of  the  1-st  kiinl  express  power  of  kiiowleilge.  Namely, 
B  <  C  may  be  interpreted  in  the  following  way;  knowledge  vveaker  than 

knowledge  i’a. 

The  set  F2  of  all  formulae  of  the  H-nd  kind  is  the  smallest  set  containing  all 
atomic  formulae  which  are  of  the  form  (n,  e).  "'here  a  G  A  and  c  G  I  and  it  is 
closed  with  respect  to  pro|)ositional  connectivi-s  V,  A,  — .  -•  and  tlie  family 
modal  conneclivt'S.  For  every  B.  /;/  is  callerl  necessity  operator 

Axioms  for  .l-logic  consist  of  three  groups;  one  for  forimilae  of  the  1-st  kind 
,  one  for  formulae  of  the  'J-iul  kind  and  one  contains  s|iecilic  axioms  lor  .1- 
logic.  Axioms  of  the  last  group  expr<“ss  characteristic  properties  of  knowledge 
representation  systems. 

Axioms  for  formulae  of  tlie  l-st  kind  are  as  follows; 

(ti)  All  axioms  for  equality  and  for  oixlering  relation, 

(to)  B<BWC  C<BV(:\ 

(ta)  BaC  <B  BAC<C\ 

where  B  and  C  are  any  terms. 

As  axioms  for  formulae  of  the  2-nd  kind  we  tissume  all  axioms  for  classical 
logic  enriched  by  axioms  for  necessity  ojierator. 

The  specific  axioms  of  .4-logic  are  as  follows; 

1.  (a,ti)  A  (a,u)  -L  for  any  a  G  A,  r,  1/  G  I  ,,  and  e  ^  u. 

2.  Vt,ei/  (<^>*0  —  1*^*'  fvery  a  G  A, 

3.  ”i(a,t))  — »  Y{(a,  u)  ;  u  G  V„,((  ^  c},  for  ('very  a  G  A. 

where  y  d)  means  a  finite  disjunction. 

The  motivation  of  t  he  axioms  alrove  is  the  following;  Specific  axioms  should 
be  characteristic  for  our  notion  of  knowledge  representation  system. 

Observe  that  the  first  axiom  follows  from  the  assumption  fliat  each  object 
can  have  exactly  one  value  for  each  attribute. 

Axiom  (2)  follows  from  the  a.ssnmplion  that  each  object  in  any  knowledgi* 
representation  system  has  a  value  with  res[)ecl  to  ('vi'iy  altrilniti'.  ll('nc('.  tin' 
description  of  objects  is  complete  np  to  a  given  set  of  attributes.  In  otlier  words, 
for  every  a  G  A  and  every  object  .r  tin'  entry  in  the  row  .r  and  the  column  n  (in 
V.9  viewed  as  a  table)  is  nonempty. 

The  third  axiom  allows  us  to  figure  out  in'gatiou  in  such  a  way  that  instead 
of  saying  that  an  object  does  not  possess  a  given  property  we  can  say  that  it  has 
one  of  the  remaining  properties.  For  example,  instead  of  saying  that  something 
is  not  blue  we  can  say  it  is  either  red  or  green  or  yellow,  etc. 

We  say  that  a  formula  d>  is  derivable  in  .4-logic  from  a  set  of  formulae  F, 
denoted  by  F  F  ^  provided  it  can  be  conclude  from  F  by  means  of  the  above 
axioms  and  the  following  rules;  modus  ponens,  and  for  every  term  B,C  and  D 


6 

I  DO 


335 


B  <  C 

hs<P  —  tc<!> 


B  <D  C  <  D  D<  B  D<C 

S  V  f  <  n  D  <  B  AC 

If  0  is  derivable  from  the  empty  set ,  ilieii  ive  u’rite  H  o  ;md  say  p  i.i  ilt  ni  iibh . 
Cdearly,  all  classical  tautologies  ;\ie  derivable.  Also  H  T  and  \rf  L  .Also  we  luive 
e.g.,  h  (lijo  V  lc<t>)  —  hij.  -9 

6.2  Semantics 

Intuitively  speaking,  formulae  of  A-logic  are  meant  as  descriptions  of  objects  of 
the  universe  and  as  connections  between  partitions  of  the  universe.  Formulae 
describe  subsets  of  objects  obeying  properties  expressed  by  these  formulae.  For 
instance,  a  natural  interpretation  of  an  atomic  formula  (a,  (-•)  is  the  set  of  all 
objects  having  value  v  for  the  attribute  a.  Hence,  a  natural  interpretation  of  a 
formula  of  the  form  /a  (a,  v)  is  the  S-lower  approximation  of  the  set  of  all  objects 
having  the  projierty  v  for  the  attribute  a. 

Now,  we  an;  going  to  give  semantics  for  the  language  of  A-logic.  Let  , 
be  the  language  described  above  determined  by  A  and  V  =  UoeA  *•*'  ~ 

(t/,A)  be  an  information  system  with  the  universe  U,  \U\  >  2^  the  set  of 
attributes  A  and  domain  of  attributes  V .  Moreover,  let  the  family  €  =  {^bIbc  A 
of  all  partitions  U  determined  by  S  be  the  lattice  of  partitions.  Recall  that 
B)  •  ■  ■  1  [®(]b}  is  a  partition  of  the  universe  U  given  by  ind{B). 

We  will  interpret  terms  as  partitions  of  the  universe  (7.  Namely,  if  B  is  a  term 
then  we  interpret  it  as  the  partition  Cb  of  t/.  The  predicate  <  is  interpreted  as 
the  converse  relation  to  Thus  if  B  <  C  is  a  formula  of  the  1-st  kind,  then 
we  interpret  it  as  Ec  <  ^b-  Clearly,  the  interpretation  of  a  formula  of  the  form 
B  =  C  is,  that,  the  partitions  Eb  and  Ei-  of  V  are  the  same 

We  say  that  a  foimula  0  of  the  l-st  kind  is  true  in  S.  denoted  by  |=  p,  if  the 
corresponding  ralation  holds  in  the  lattice  of  partitions  of  N.  More  precisely,  a 
formula  of  the  form  B  <  C  lor  B  =  C)  is  t  rue  in  N  if  E^  -<  E-b  (or  Eb  =  ‘^'c  ) 
holds  in  the  lattice  E. 

Let  0  be  a  formula  of  the  2-st  kind  from  y. 

We  say  that  an  object  x  £  U  .satisfies  an  atomic  formula  (a,  v)  if  and  only  if 
a(x)  =  V.  In  the  standard  way  we  extend  this  definition  on  the  set  of  all  formulae 
of  the  2-11(1  kind.  If  x  satisfies  0  then  we  will  write  x  |=  0. 

Finally,  we  say  that  a  formula  0  of  the  2-nd  kind  is  true  in  S,  denoted  by 
[=  0,  if  10(5  =  {x  ;  a;  1=  0}  =  f/.  If  any  formula  0  (  of  the  l-st  or  of  the  2-nd 
kind)  is  true  in  S  then  we  call  the  information  system  N  a  model  for  0. 

We  say  that  F  implies  0,  denoted  by  F  |=  0,  if  from  the  fact  that  N  is  a 
model  of  F  follows  that  is  a  model  of  0. 

Next  two  theorems  show  that  our  axiomatization  has  been  adcrpiat.ly  choosi'ii 
and  that  it  is  complete. 


336 


Theorem  6.1  (souudne.-is)  Lil  F  lx  a  set  oj  foriiiulat.  Ij  F  H  <p  Hun  F  |=  o. 


□ 


Theorem  6.2  (completeness)  Let  F  be  a  set  of  formulae  and  let  0  be  a  formula 
of  y  If  F  0  then  F  0. 


□ 

For  every  term  B  define  now  now  mod.il  connoctivos  ('n  and  Bru  a.s  f'olkjws: 
for  any  formula  of  the  2-nd  kind  <?  put: 

Ci}0  =  -'lij-'O 


and 


Brij0  =  ('b0  a  ->l[i0 


It  is  easy  to  check  tliat 


K'/roU-  =  in\o\s)  ■ 

where  Cb  is  the  unary  connective  rlefinoil  above,  and  B  i.s  nppc'r  appro.xiniation 
determined  by  ind{B)  in  l.hc  knowledj!,e  repri'.s<‘ntalion  sysi('m  c'  =  (/  ',  Aj. 
Moreover, 

where  Sb(|0|5)  means  the  boundary  of  in  S  determined  by  irid(B). 

Theorem  6.3  Let  a  knowledge  representation  system  S  =  [U ,  A)  be  given.  The 
following  conditions  are  equivalent : 

1.  A  set  X  C  F  IS  B-definable  in  S. 

2.  A  formula  of  the  form  Ib0  —  O  is  trai  in  S,  uhen  tiu  iiuamny  of  O  is  A, 
that  IS,  Id'l.!?  =  A'. 

3.  BrB0  IS  false,  where  0  is  as  before. 

4-  \Ib0\s  =  X. 


a 

Finally  we  have 

Theorem  6.4  An  information  sg.stem  =  ((A  A)  is  a  model  for  a  formula  o 
if  and  only  if  there  is  a  subset  B  of  A  siieli  that  the  information  sysltiii  (U ,  B) 
IS  a  model  for  0.  O 


337 


References 

1.  Halpeni,  J.  (ed.)  (1986)  l^rocetdinys  of  the  Conference  Tlnoreiical  Aspects 
of  Reasoning  about  hnowtedyi.  Morgan  Kaufiuanii. 

2.  Halpeni  J.Y.,  Moses  Y.  (1992)  A  guide  lo  completeness  and  complexity  for 
modal  logics  of  knowledge  and  belief.  Artificial  Inielliytnci .  54,  pp  309  -  379 

3.  Hintikka,  J.(1962)  Knowledge  an  Helief.  Cornett  Uniiersit y  Press. 

4.  Holland,  J.H.,  Holyoak,  K.J.,  Ni.sbelt,  R.E.,  and  Thagard,  P.R.  (1986).  In¬ 
duction:  Processes  of  Inference,  Learning  and  Di.scovery.  MIT  Press. 

5.  Minski,  M.  (1975),  A  F'ramwork  for  Representation  Knowledge.  In  Winston, 
P .  (ed)  The  Psychology  of  Computer  Vision  McGraw-Hill, New  York  pp  211- 
277. 

6.  Parikh,  R  (ed.)  (1990)  Proceedings  of  the  Conference  Theoretical  Aspects  of 
Reasoning  about  Knoioledge,  Morgan  Kanfmann. 

7.  Pawlak,  Z.  (1982),  Rough  Sets.  International  .Journal  of  Computer  and  In¬ 
formation  Sciences,  11,  pp.  341-346. 

8.  Pawlak,  /.  (1991),  Rough  Set.s-  riieoretical  Aspects  of  Reasoning  about 
Data.  Kluwer. 

9.  Pawlak,  Z.  (1990)  Decision  Logic,  Report  of  the  Warsaw  University  of  Tech¬ 
nology,  Institute  of  Computer  Science. 

10.  Rasiowa,  H,  (1990)  On  Approximation  Logics:  A  survey.  Jahrbuch  1990  der 
Kurt  Godel  Gesellschaft. 

11.  Rauszer,  (..M.,  Logic  lor  Inlormation  Systems,  to  appear  in  Fundamcniu 
Informaticae. 

12.  Skovvron,  A.,  On  lopology  in  Information  Systems.  Bull  of  the  Polish  Academv 
of  Sciences,  vol  36,  No  7-8.  1968.  pp. 477- 179. 


Modelling  of  Industrial  Systems 


Lennart  Ljung 

Department  of  Electrical  Engineering,  Linkoping  University 
S-581  83  Linkoping,  Sweden 


Abstract 


In  this  contribution  we  give  an  overview  over  those  techniques 
-  mainly  from  the  control  field  -  that  are  used  to  derive  models  of 
dynamical  systems  from  observed  data. 


1  Main  stream  System  Identification 

A  typical  problem 

Here  is  a  typical  system  identification  problem:  We  observe  inputs  u{t) 
and  outputs  y(t)  from  a  system.  Here  t  denotes  sample  points:  t  = 
1, . . JV.  We  want  to  construct  a  model  of  the  system,  and  may  seek  a 
model  of  the  simple  form 


y{t)  +  ay{t  -  1)  =  biu{t  -  1)  +  b2u{t  -  2)  (1) 


It  thus  just  remains  to  determine  suitable  values  of  the  parameters  a,  6] 
and  62.  This  could  be  done  by  the  well  known  least  squares  method. 

N 

min  V(y(t)  +  ay{t  -  1)  -  biu{t  -  1)  -  b2u{t  -  2)f 
a, 01,02  " 


(2) 


339 


The  minimizing  values  and  b^  can  in  this  case  be  easily  computed 

since  (2)  is  a  quadratic  function.  They  give  the  model 

y{t)  +  a’^yit  -  1)  =  -  1)  +  -  2)  (3) 

of  the  system.  This  simple  system  identification  problem  is  a  special  case 
of  a  broad  class  of  model  building  problems,  which  we  now  describe: 


Training  sets  and  mathematical  models 


Here  is  an  archetypical  problem  in  science  and  human  learning:”We  are 
shown  a  collection  of  vector  pairs  {[j/(0i  ®(0]l  ^  Call  this  ’’the 

training  set”.  We  are  then  shown  a  new  value  z(A^  +  1)  and  are  asked  to 
name  a  corresponding  value  yiN  +  1).”  The  variable  t  could  be  thought 
of  as  time,  but  could  be  anything.  The  vectors  y{t)  and  x{t)  may  take 
values  in  any  sets  (finite  sets  or  subsets  of  §1”  or  anything  else)  and  the 
dimension  of  x{t)  could  very  well  depend  on  t  (and  could  be  unbounded). 
The  formulation  covers  most  kinds  of  classification  and  model  building 
problems. 

How  to  solve  this  problem?  The  mathematical  modelling  approach  is  to 
construct  a  function  gyv(<,x(t))  based  on  the  "training”  set,  and  to  use 
this  function  for  pairing  y{t)  to  new  x(t): 

y{t)  =  9N{t,x{t))  (4) 

Where  do  we  get  the  function  g  from?  Essentially  we  have  to  search  for 
it  in  a  family  of  functions  that  is  described  (parametrized)  in  terms  of 
a  finite  number  of  parameters.  These  parameters  will  be  denoted  by  0. 
The  family  of  candidate  model  functions  will  be  caUed  a  model  structure, 
and  we  write  the  function  as 


g{t,0,x{t)) 


(5) 


The  value  y{t)  is  thus  matched  against  the  ’’candidate”  g{t,0,x{t))  : 


y{i) 9{t,6,x{t))  (6) 

We  shall  also  use  the  notation 


y{t\0)  =  g{t,  0,  x{t)) 


(7) 


340 


to  stress  that  3  is  a  “predicted”  or  “guessed”  j/-value.  The  search  for  a 
good  model  function  is  then  carried  out  in  terms  of  the  parameters  9,  and 
the  chosen  value  9f^f  gives  us 

gN{t,x{t))  =  (8) 

The  case  (1)  corresponds  to 

9  =  [0,61,62] 

^(0  =  [j/(< -!),«(<- 2)1  (9) 

g{t,9,x{t))  =  -ay{t  -  1) biu{t  -  1)  +  b2u{t  -  2) 

In  general  the  function  </  is  a  mapping  from  the  set  where  x{t)  takes  its 
values  to  the  space  where  y{t)  takes  its  values.  All  kinds  of  parameter- 
izations  are  possible,  from  ones  that  are  tailor-made  for  the  application 
to  general  orthogonal  functions  expansion  and  neural  net  structures. 


Signals  and  Dynamical  Systems 


The  general  formulation  above  fits  into  conventional  system  identification, 
which  corresponds  to  particular  functions  g.  We  have  already  shown  that 
the  simple  (ARX)  model  (1)  fits  into  the  framework. 

In  general  the  task  to  form  pairs  [y(t),i(t)]  is  the  one-step  ahead  predic¬ 
tion  problem.  For  example,  first  order  ARMA  model  of  {y(<)} 

y(t) -I- aj/(t  -  1)  =  e(t) -t- ce(i  -  1)  (10) 

where  {e(<)}  is  a  white  noise  sequence,  is  obtained  for 

9  =  [a  c];x(t)  =  [y(0),  y(l) . .  .y(t  -  1)] 

g{t,9,x{t))=  J^{c-  a){-cy-’^-^y{k)  (11) 

fc=o 


etc.  Fuzzy  -  or  verbal  -  dynamical  models  can  be  obtained  if  x{t)  and  y(t) 
take  on  values  like  ’’the  oven  is  very  hot”,  ’’the  oven  is  warm”,  ’’the  water 
is  boiling”  and  so  on.  The  function  g  would  then  be  some  kind  of  a  table 
-  perhaps  implemented  in  an  expert  system  shell  -  and  its  parameters  9 
would  describe  the  structure  of  the  table. 


341 


Fitting  model  structures  to  data 

The  leading  principle  for  choosing  0  clearly  is  to  have  g(t,0,x(t))  perform 
well  on  the  training  set,  that  is  to  make 

y(t)  close  to  g(t,0,x(t))  t  =  1, . . . ,  N  (12) 

This  principle  applies  also  to  the  case  where  x  and  y  assume  non-numeric 
values,  if  only  "close”  can  be  appropriately  defined.  Most  numeric  schemes 
select  0  =  flyv  so  that 

N 

(13) 

«=i 

is  minimized  for  some  norm  ||  •  ((  ("norm”  should  here  be  taken  in  a  broad 
sense)  or  so  that  y{t)  -  g{t,6,x(t))  is  uncorrelated  with  information  in 
xit). 


Typical  asymptotic  properties 


A  key  question  is:  How  good  are  the  estimates  obtained  by  (13)?  The 
typical  analysis  goes  as  follows:  Suppose  that  the  pairs  [y(<),x(<)]  really 
are  related  by 

y(t)  =  go(t,x(t))  +  v(t)  (14) 

where  {v(t)}  is  an  as  yet  undefined  sequence 


•  Consistency:  Suppose  there  is  a  "true”  system  description  avail¬ 
able  within  the  model  structure.  We  translate  that  as  for  some  Oq, 
g{t,0o,x{t))  =  go{t,x{t))  and  v{t)  is  white  noise.  Then  0^  will  con¬ 
verge  to  ^0  as  iV  increases  to  infinity,  and  the  difference  y/Ni^N  —  ^o) 
will  converge  in  distribution  to  a  Gaussian  random  variable  (i.e.  0j\/ 
tends  to  0o  with  the  "rate”  ~  lly/N) 

•  Convergence:  Suppose  no  "true”  description  is  available  in  the 

model  structure,  but  assume  that  {«(<)}  (14)  is  white  noise.  Then 

0!\/  will  converge  to  a  value  0.  such  that  g{t,0,,x{t))  approximates 
go(t,x{t))  as  well  as  possible  in  the  chosen  norm  in  (13).  More¬ 
over  converges  in  distribution  to  a  Gaussian  random 

variable. 


342 


•  Making  a  sieve  finer  and  finer  An  interesting  particular  case  is 
when  the  true  system  is  assumed  to  belong  to  a  very  broad  class 
of  models,  that  cannot  be  parametrized  by  a  finite  number  of  pa¬ 
rameters.  However  this  class  can  be  thought  of  as  ”the  limit”  of 
increasing  model  structures,  that  are  parametrized  by  more  and 
more  parameters.  (Think  e.g.  of  an  infinite  dimensional  system 
that  can  be  seen  as  the  limit  of  finite  impulse  response  models  as 
the  number  of  coefficients  tends  to  infinity).  Mathematically  this 
can  be  written  as 

5-o(i,  4^(0) belongs  to  ,x(t))  (15) 

where  the  vector  0^  contains  d  parameters.  To  deal  with  this  case 
it  is  customary  to  employ  more  and  more  parameters  as  more  and 
more  data  becomes  available.  That  is  d  becomes  a  function  of  N  : 
d{N). 

If  {v(0}  ill  (14)  is  white  noise  and  if  d(N)  is  chosen  to  increase  to 
infinity  slowly  enough  with  A,  we  then  have  that  the  model  will 
approach  the  true  system  as  the  number  of  data  tends  to  infinity. 
This  can  be  written  formally  as 

^  9o{t,3:{t))  as  A  CO  (16) 

With  this  we  conclude  our  brief  expose  of  the  main  stream  system  identi¬ 
fication.  See  the  textbooks,  e.g.  [6]  and  [10]  for  the  details  of  the  general 
methods  and  results.  Of  course,  system  identification  covers  many  other 
topics  like  how  to  compute  the  parameter  values  that  minimize  (13)  and 
how  to  select  the  data  so  that  they  are  as  informative  as  possible. 


2  The  Model  Structure 


The  single  most  important  step  in  the  identification  process  is  to  decide 
upon  a  model  structure  such  as  (6).  In  practice  typically  a  whole  lot  of 
them  are  tried  out  and  the  process  of  identification  really  becomes  the 
process  of  evaluating  and  choosing  between  the  resulting  models  in  these 
different  structures. 

It  is  natural  to  distinguish  between  three  types  of  model  structures. 


343 


1.  Black-box  structures 

2.  Structures  from  physical  modelling 

3.  Structures  from  semi-physical  modelling 

2.1  Black-box  structures 


A  black-box  structure  is  one  where  the  parametrization  in  terms  of  6 
is  chosen  so  that  the  family  of  models  {^(t,  0,  x(t))0  6  Dm}  covers  as 
“many  common  and  interesting”  ones  as  possible.  No  particular  attention 
to  the  actual  application  is  then  paid.  For  a  linear  system  (a  linear 
mapping  from  past  data  to  future  ones)  we  could  for  example  think  of 
choosing  the  parameters  as  the  impulse  response  coefficients,  of  a  finite 
impulse  response  model 

M 

ym  =  Y.^ku{t-k)  (17) 

k=\ 

More  common  in  control  applications  is  the  ARX  black  box  structure  for 
linear  systems: 


y{t\0)  =  -aiy{t-l)-a2y{t-2)-. .  .-anyit-n)-\-biu{t-l)+. .  .+bmu{t-m) 

(18) 


“the  mother  of  all  dynamical  model  structures”. 


In  general  we  can  write  a  black  box  structure  conceptually  as 

M 

m9)  =  T.»khkixit))  (19) 

fc=i 

i.e.  as  some  kind  of  function  expansion.  In  the  general  case  the  basis 
functions  {hk}  may  also  depend  on  0. 

It  is  instructive  to  distinguish  between  two  principally  different  basis  func¬ 
tions: 


•  Global:  Each  of  the  hk  have  support  in  the  whole  x-space 

•  Local:  Each  of  the  hk  has  support  only  in  a  small  local  box  in  the 
x-space. 


344 


Among  black-box  structures  that  use  global  basis  functions  are  all  the 
usual  linear  black  box  models,  Volterra  series  expansions  and  so  on. 

The  local  basic  functions  models  can  be  visualized  as  a  multidimensional 
table:  The  ar-space  has  been  split  up  into  a  number  of  boxes.  A  new 
observation  x{t)  then  falls  into  one  of  these  boxes,  the  one  correspond¬ 
ing  to  say  hk,  and  the  predicted  output  is  then  taken  as  9k  (or  possibly 
interpolated,  taking  into  account  few  neighboring  boxes).  The  sizes  and 
locations  of  the  boxes  can  be  determined  with  the  aid  of  estimation  data. 
The  extreme  case  is  when  the  boxes  are  determined  so  that  exactly  one 
data  point  x{t)  t  =  I . .  .,N  has  fallen  in  each  box:  this  is  the  so  called 
nearest  neighbor  approach  [13].  All  this  is  well  estabUshed  in  the  statis¬ 
tical  literature  under  names  of  “non-parametric  regression”  and  “density 
estimation”  e.g.,  [11],  [2]. 

Neural  network  model  structures,  [8],  represent  a  spectacular  revival  of 
these  techniques.  So  called  radial  basis  networks  correspond  to  localized 
bases  (where  the  “boxes”  overlap  like  Gaussian  distribution  functions), 
while  the  feed-forward  sigmoid  network  formally  would  use  global  basis 
functions  (although  the  “dynamic  effects”  really  are  localized).  Fuzzy 
modelling  [5]  is  again  an  example  of  localized  basis  functions  with  typ¬ 
ically  polynomial  interpolation  rules,  which  are  inherited  by  the  “mem¬ 
bership  function”. 

It  is  worth  stressing  that  these  new  techniques  of  neural  net  modelling 
and  fuzzy  identification  represent  useful  revitalization  of  non-linear  black 
box  modelling  with  some  new  particular  structures,  but  at  the  same  time 
they  definitely  fall  into  a  very  old  and  classical  framework  of  estimation 
techniques  (See,  e.g.  [7],  [1].) 


2.2  Structures  from  physical  modelling 


In  case  we  have  physical  insight  into  the  properties  of  the  system  to  be 
identified,  it  is  natural  to  exploit  this:  “Don‘t  estimate  what  you  already 
know!”  Basically  we  then  write  down  those  physical  laws  and  relation¬ 
ships  that  describe  the  system.  Most  often  they  are  then  summarized  in 


345 


a  state  space  form  like 

x(t)  =  /(t,x(t),ff,u(t),v(t)) 
y(t)  =  h(i,x(t),ff,v(t)) 

where  ^  denotes  unknown  physical  constants  in  the  description.  The  iden¬ 
tification  process  is  then  to  estimate  these  constants.  That  route  takes 
us  from  (20)  via  (7)  (explicitly  or  implicitly)  and  (12)  to  the  estimate  ff/v. 
The  work  to  arrive  at  (20)  and  then  to  actually  carry  out  the  minimization 
of  (12)  can  be  considerable,  though. 


2.3  Semiphysical  model  structures 

The  logical  route  to  utilize  available  physical  knowledge  may  -  as  pointed 
out  -  be  quite  laborious.  It  is  then  tempting  to  instead  try  some  simpl  * 
black-box  structures,  such  as  the  ARX  model  (18)  f“Try  Simple  Things 
First”).  This  is  quite  OK,  but  it  should  in  any  case  be  combined  with 
physical  insight.  Here  is  a  toy  example  to  illustrate  the  point: 

“Suppose  we  want  to  build  a  model  for  how  the  voltage  applied  to  an 
electric  heater  affects  the  temperature  of  the  room.  Physical  modelling 
entails  writing  down  all  equations  relating  to  the  power  of  the  heater, 
heat  transfer,  heat  convection  and  so  on.  This  involves  several  equations, 
expressions  and  unknown  heat  transfer  coefficients  and  so  on.  .4  simple 
black-box  approach  would  instead  be  to  use,  say  the  ARX-model  (18)  with 
u  as  the  applied  voltage  and  y  the  room  temperature.  But  that ’s  too  simple! 
A  moment’s  reflection  reveals  that  it’s  the  heater  power  rather  than  the 
voltage  that  gives  the  temperature  change.  Thus  use  (18)  with  u=  squared 
voltage  and  y=  room  temperature.” 

I  would  like  to  coin  the  term  semi-physical  modelling  for  introducing 
non-linear  transformation  of  the  raw  measurement,  based  on  high-school 
physics  and  common  sense.  The  transferred  measurements  are  then  used 
in  black-box  structures  such  as  the  ARX  structure. 

Clearly  semi-physical  modelling  is  in  frequent  use.  It  is  however  also 
true  that  many  failures  of  identification  are  indeed  to  be  blamed  on  not 
applying  this  principle. 


346 


2.4  Hybrid  structures 

Of  particular  current  interest  is  to  conceive  model  structures  that  are 
capable  of  dealing  both  with  dynamic  effects,  described  by  differential//- 
difference  equations  and  with  logical  constiaints,  “the  if;s  and  the  but:s” 
of  the  system.  Not  so  many  concrete  results  have  yet  been  obtained  in 
this  area,  but  quite  intense  work  intense  work  is  going  on  now.  We  may 
point  to  some  work  on  using  three  models  and  pattern  recognition  for 
these  hybrid  model  structures;  [12],  [9]. 


3  Model  validation 


It  is  not  enough  to  come  up  with  a  nominal  model  0/v  from  (13)  -  we 
must  also  have  a  measure  of  its  reliability.  Model  validation  is  the  process 
of  examining  the  model,  assessing  its  quality  and  possibly  rejecting  its 
use  for  the  purpose  in  question.  In  a  sense  this  could  be  viewed  as  the 
essential  process  of  identification  -  the  estimation  phase  is  really  just  a 
means  to  provide  candidate  models  that  might  pass  the  needle’s  eye  of 
validation. 

Model  validation  has  at  least  these  different  objectives; 


1.  To  decide  if  the  model  is  “good  enough”  for  the  intended  application 

2.  To  decide  how  “far  from  the  true  system  description”  the  model 
might  be 

3.  To  decide  whether  the  model  and  the  data  indeed  are  consistent 
with  assumptions  of  the  model  structure. 


These  objectives  partly  overlap,  but  it  is  still  possible  to  single  out  basic 
techniques: 

1.  The  most  obvious  and  pragmatic  way  to  decide  if  a  model  is  good 
enough  is  to  test  how  well  it  is  able  to  reproduce  validation  data 
(data  that  were  not  used  to  estimate  the  model)  in  simulation  or 


347 


prediction.  The  user  can  then  by  eye  inspection  decide  if  the  fit  is 
“good  enough”.  In  my  mind  this  is  the  prime  validation  tool. 

2.  To  determine  error  bounds  -  how  far  is  the  true  system  from  the 
model  -  is  a  fundamentally  difficult  question.  If  we  adopt  a  prob¬ 
abilistic  setting  and  assume  that  the  true  system  is  to  be  found 
withir  the  chosen  structure  it  becomes  a  matter  to  see  how  much 
the  stochastic  disturbances  might  have  affected  the  model.  The  co- 
variance  matrix  of  the  asymptotic  distribution  is  classically  used  for 
the  error  bounds  in  this  case.  This  covariance  matrix  is  generally 
given  by 

cov{dN}  ~  ^ Ev^(t)[cov{-^yit\9)}]-^  (21) 

for  the  structure  (7),  (14).  If  we  (according  to  3)  below)  cannot  dis¬ 
prove  that  the  true  system  can  be  represented  in  the  chosen  struc¬ 
ture  it  is  still  reasonable  to  use  the  measure  (21). 

The  remaining  cases:  No  probabilistic  setting  adapted  and/or  the 
used  model  structure  is  known  to  be  too  simple  has  spurred  a  con¬ 
siderable  interest  recently  [4].  It  would  lead  too  far  to  review  that 
literature  here. 

3.  The  test  if  the  data  and  the  model  are  consistent  with  the  model 
structure  assumptions,  is  again  a  more  straightforward  task.  Basi¬ 
cally  we  compute  the  residuals  y{t)  —  y{t\0N)  =  ^(0  from  the  model 
and  a  (validation)  data  set  and  check  if 

(a)  |€(t)|  <  C  in  a  deterministic  setting. 

(b)  f(<)  and  u{t  —  r)  are  independent  random  variables,  in  a  prob¬ 
abilistic  setting  {u  is  the  input  to  the  system). 

The  latter  test  is  one  of  many  residual  analysis  tests  that  can  be 
performed,  and  this  is  standard  statistical  practice,  see  e.g.  [3]. 


4  Conclusions 


We  have  in  this  contribution  pointed  to  some  basic  issues  in  how  to  build 
mathematical  models  of  real-life  dynamical  systems.  The  situation  is 
well  consolidated  for  purely  dynamical  systems,  i.e.  those  that  can  be 


348 


described  by  difference  or  differential  equation.  The  basic  principles,  effi¬ 
cient  algorithms  and  well-spread  commercial  software  purchases  are  well 
established.  These  techniques  have  also  been  in  industrial  use  for  quite 
some  time. 

Of  great  current  interest  is  to  move  on  to  systems  that  also  are  charac¬ 
terized  by  logical  constraints,  switching  dynamical  properties  depending 
on  certain  logical  conditions  and  so  on.  This  will  probably  require  joint 
efforts  between  the  control  and  computer  science  communities. 


References 

[1]  A.R.  Barron.  Statistical  properteis  of  artificial  neural  networks.  In 
Proceedings  of  the  28th  IEEE  Conference  on  Decision  and  Control, 
pages  280-285, 1989. 

[2]  L.  Devroyeand  L.  Gyorfi.  N on-parametric  density  estimation.  Wiley, 
New  York,  1985. 

[3]  N.R.  Draper  and  H.  Smith.  Applied  Regression  Analysis,  2nd  ed. 
Wiley,  New  York,  1981. 

[4]  G.C.  Goodwin,  M.  Gevers,  and  B.  Ninness.  Quantifying  the  error  in 
estimated  transfer  functions  with  application  to  model  order  selec¬ 
tion.  IEEE  Trans.  Automatic  Control,  37(7):913-929,  1992. 

[5]  C.W.  Ku  and  Y.Z.  Lu.  Fuzzy  model  identification  and  self-learning. 
IEEE  Trans,  on  SMC,  17,  1987. 

[6]  L.  Ljung.  System  Identification  -  Theory  for  the  User.  Prentice-Hall, 
Englewood  Cliffs,  N.J.,  1987. 

[7]  L.  Ljung  and  J.  Sjoberg.  A  system  identification  perspective  on 
neural  nets.  In  S.Y.  Kung  et  al,  editor.  Neural  Networks  for  Signal 
Processing,  Proc  of  the  1992  lEEE-SP  Workshop,  pages  423-435. 
IEEE  Press,  1992. 

[8]  K.S.  Narendra  and  K.  Parathasarathy.  Identification  and  control 
of  dynamical  systems  using  neural  networks.  IEEE  Trans.  Neural 
Networks,  1:4  27,  1990. 


349 

[9]  A.  Skeppstedt,  L.  Ljung,  and  M.  Millnert.  Construction  of  composite 
models  from  observed  data.  Int.  j.  Control,  55(1):141-152,  1992. 

[10]  T.  Soderstrom  and  P.  Stoica.  System  Identification.  Prentice-Hall 
Int.,  London,  1989. 

[11]  C.J.  Stone.  Consistent  non- parametric  regression  (with  discussion). 
Ann.  Statist.,  5:595-645,  1977. 

[12]  J.E.  Strdmberg,  F.  Gustafsson,  and  L.  Ljung.  Trees  as  black-box 
model  structures  for  dynamical  systems.  In  Proc.  Jst  European  Con¬ 
trol  conference  (ECC’91),  pages  1175-1180,  Grenoble,  France,  1991. 

[13]  T.M. Cover  and  P.E.  Hart.  Nearest  neigbor  pattern  classification. 
Trans.  IEEE  Info.  Theory,  IT- 13:21-27,  1967. 


On  the  Satisfiability  of  Symmetrical 
Constrained  Satisfaction  Problems 


Jean-Francois  Puget 

ILOG  SA,  2  avenue  Gallieiii,  BP  85,  F-<jr253  Gentilly  Cedex,  FRANCE 
email  ;  piij^et'Wilog.fr 


Abstract.  Constrciiiit  satisfaction  problems  ((  'SP)  are  a  class  of  combi¬ 
natorial  problems  tliat  can  be  solved  efficiently  by  combining  consislenc  y 
methods  such  as  arc-consistency  together  with  a  backtracking  search. 
However  these  techniques  are  not  adapted  to  symmetrical  CSP.  In  fact 
one  can  exhibit  t<ither  small  CSP  that  cannot  be  solved  with  consistency 
techniques.  The  relevance  of  this  symmetry  problem  to  real  world  ap¬ 
plications  is  very  strong  since  it  can  prevent  a  CSP  solver  to  solve  even 
small  instances  of  real  world  problems.  This  paper  describes  a  general 
solution  for  this  kind  of  problems.  Both  a  theoretical  study  and  experi¬ 
mental  results  using  the  constraint-ba.sed  library  PECOS  are  provided. 


1  Introduction 

Constraint  satisfaction  problems  (C.SP)  ar<'  a  cla.s.s  of  combinatorial  problems 
that  can  be  solved  efficiently  by  combining  local  consistency  methods,  such  as 
arc-consistency  [8],  together  with  a  backtracking  search.  The  idea  of  consistency 
method  is  to  prune  the  search  by  preventing  variable  instantiation  that  are  not 
consistent  with  the  constraints  of  the  problem.  Different  methods  for  this  kind 
of  pruning  have  been  studied,  e.g.,  [8],  [7],  [9],  [10],  [1],  [3].  Sections  2  provides 
notation  and  semantics  for  this  kind  of  luoblems. 

However  arc-ccnsistency  is  not  adapted  t.c)  symmetrical  CSPs.  Avoiding  sym¬ 
metries  is  very  important  in  real  world  applications  since  they  can  prevent  a 
CSP  solver  from  solving  even  small  instances  of  real  world  problems.  The  work 
presented  in  this  paper  has  been  done  while  imiilementing  a  constraint-based 
programming  library  called  PECOS  [5].  This  library  has  been  used  in  a  variety 
of  industrial  problems.  In  some  of  these  problems,  the  success  was  only  possible 
because  we  could  avoid  the  combinatorial  explosion  caused  by  symmetries. 

In  fact  one  can  exhibit  rather  small  CSP  that  cannot  be  solved  with  con¬ 
sistency  techniques.  A  simple  problem  is  the  pigeonhole  CSP  :  one  have  to  put 
yV  -f  1  pigeons  in  N  holes  such  that  each  pigeon  is  in  a  different  hole.  The 
problem  can  be  viewed  as  a  CSP  by  as.sociatiiig  one  variable  to  each  pigeon, 
the  value  of  wdiich  being  the  hole  where  the  pigeon  is  placed.  These  variables 
must  be  different  from  each  other.  I'his  CSP  has  no  solutions  since  there  is 
one  more  pigeon  than  holes.  However  <'v<  u  for  small  (.V  =  20)  instaina's.  this 


351 


problem  cannot  be  solved  by  usual  C'Sl’  solvers  based  on  consistency  techniques. 

A  typical  CSP  solver  may  proceed  as  follows.  It  first  instantiates  the  first 
N  —  I  variables,  thus  using  iV  —  1  holes.  Then,  it  tries  to  instantiate  the  last 
two  but  it  fails  since  there  is  only  one  remaining  hole  for  two  variables.  Thus 
the  solver  will  try  another  instantiation  for  the  first  N  —  1  variables,  and  then 
fail.  The  solver  will  have  to  try  every  possible  instantiation  of  the  first  N  —  1 
variables  to  prove  that  there  is  no  solutions.  There  are  roughly  factorial{S  —  1) 
such  possible  instantiations.  For  .V  =  '20  this  gives  about  10'^  different  possibil¬ 
ities,  which  is  clearly  too  much  for  current  algorithms. 

The  problem  is  due  to  the  symmetries  of  the  CSP  :  any  permutation  of  the 
variables  of  the  CSP  does  not  change  the  (JSP.  Section  3  formalizes  the  notion 
of  symmetrical  problems.  A  smarter  approach  would  avoid  the  testing  of  permu¬ 
tations  of  a  subproblem  solution  if  it  leads  to  a  failure.  This  approach  has  been 
studied  in  [1]  and  [13]  (see  the  related  work  section).  We  present  another  possible 
approach:  always  avoid  the  test  of  permutations  of  a  subproblem  solution,  by 
adding  new  constraints,  as  explained  in  .section  4.  We  then  show  how  our  method 
can  solve  a  very  difficult  prolilem  known  as  Ramsey’s  problem  in  section  5.  We 
end  the  paper  with  a  comparison  with  related  work  and  some  benchmark  results 


2  Semantics 


Two  types  of  semantics  have  been  proposed  for  constraint  satisfaction  problems. 
The  first  one  describes  a  CSP  tis  a  graph,  where  nodes  are  variables,  and  arcs 
are  binary  constraints  [8],  [9],  hence  the  iitime  arc-consistency.  The  other,  called 
Constraint  Logic  Programming  describes  constraint  satisfaction  as  an  extension 
of  unification  [G],  [14],  We  will  describe  an  intermediate  semantics:  it  does  not 
use  unification  or  Herbrand  terms,  ;ind  it  makes  a  distinction  between  a  con¬ 
straint,  and  a  constraint  stated  on  specific  variables  (a  literal).  Phis  semantics 
has  been  first  proposed  in  [12]. 

In  the  remainder  of  the  paper,  V  represents  a  set  of  variables,  and  V  a  set 
of  constants.  Intuitively,  V  is  the  set  of  possible  values  for  the  variables  of  V. 
Consistency  techniques  associate  to  each  variable  the  set  of  possibles  values  for 
that  variable,  usually  called  the  domain  of  the  variable. 

Definition,  (domains  and  values)  A  domain  assignment  is  a  mapping 
from  V  to  the  powerset  of  'P.  If  e  is  a  variable  and  dom  a  domain  mapping. 
dom{v)  IS  called  the  domain  associated  to  v  by  dom.  A  domain  assignment  dom 
IS  represented  by  the  set  of  pairs  v/dorn(v)  for  all  v  such  that  c/om(i')  f-  D.  .A 
variable  is  instantiated  by  a  domain  assiynintnt  dom.  iff  dom{v)  is  a  singleton 
{d}.  In  that  case,  d  is  called  tin  value  of  the  variable. 


352 


The  semantics  of  a  constraint  is  usually  (Jefiiu>d  be  the  set  of  tuples  satisfying 
the  constraint.  A  literal  is  a  constraint  applied  to  variables  : 

Definition,  (constraint,  literal)  A  constraint  C  of  arity  n  is  defined  by 
a  subset  ext(C)  of  the  cartesian  product  .d  constrained  literal  is  a  formula 
C{vi, . . .  ,Vn),  where  C  is  a  constraint  of  arity  n,  and  V[, . . .  ,Vn  are  variables  of 

V. 


A  problem  is  then  defined  by  a  set  of  variables,  a  domain  and  a  set  of  con¬ 
strained  literals  ; 

Definition  (CSP)  A  Constraint  Satisfaction  Problem  is  a  triple  (V,X>,C) 
where  C  is  a  conjunction  of  constrained  lilerats 

The  semantics  of  a  constrained  problem  is  the  following; 

Definition  (solutions)  An  interpretation  is  a  domain  assignment  that  in¬ 
stantiates  all  the  variables  ofV,  t.e.,  Vt>  €  V,  3(/  €  V,  dom[v)  =  {d) . 

A  constrained  literal  C'(vi,  ■  •  -  .nn)  is  satisfied  by  an  interpretation  dom  iff 
{dom{vi), . . . ,  dom{vn))  €  ext(C). 

An  interpretation  dom  is  a  solution  for  a  problem  (V,V,C)  iff  every  con¬ 
strained  literal  in  C  is  satisfied  by  dom. 

For  instance,  if 

V  =  {x,  i/}, 

P  =  {0,1,2}, 

C  =  xr^yAy:/:z/\z^x 

then  a  possible  domain  assignment  is 

x/{0,l,2},y/{1.2}-V{l,2} 

There  are  six  solutions  for  this  problem,  including: 

*/{0},y/{l},2/{2} 

x/{0},y/{2},z/{l}. 

The  usual  way  to  solve  CSP  problems  is  to  use  an  enumeration  technique 
together  with  a  local  consistency  prunning,  such  as  arc- consistency  [8].  Several 
local  consistency  algorithms  have  been  propo.scd,  including  AC-3  [8],  AC-4  [9], 
GAC-4  [10],  AC-5  [3]. 

In  the  remainder  of  the  paper,  we  assume  that  V  is  a  set  of  integers,  together 
with  the  usual  ordering.  We  will  use  four  basic  constraints  ;  =,  ^,  <  and  <.  The 
above  semantics  can  be  extended  when  there  is  an  ordering  relation  on  the  set  of 
values  V  [2].  This  semantics  allows  for  a  very  simple  definition  of  the  constraints 
—  <  9^1  ^  <•  However  these  constraints  can  be  defined  using  the  non-ordered 

semantics  presented  here.  For  instance,  if  P  is  {0, 1,2),  the  .semantics  of  the  con- 


353 


straints  is  the  following: 

ex<(  =  )  =  {(0,0), (1,1), (-2, 2)} 
ext(#)  =  {(0,1),(1,0),(0,2).(2,0),(1,2),(2,1)} 
ext(<)  =  {(0, 0),  (0,  1 ),  (0,2), ( I,  1 ),  ( 1, 2),  (2,2)} 
ext«)  =  {(0,1), (0,2), (1,2)} 

3  Symmetrical  CSPs 

A  symmetrical  problem  is  a  problem  where  some  permutations  of  the  variables 
map  a  solution  onto  another  solution.  We  first  define  more  precisely  what  is  a 
permutation  of  the  variables  of  a  problem. 

Definition  (permutation  of  variables)  .4  permutation  of  the  variables  of 
a  CSP  (V,'D,C)  IS  a  bijecttoii  from  V  to  V.  If  r  )s  a  permutation  of  the  variables 
of  a  CSPV  =  (V,I>,C)  the  permuted  CSP  t(V)  is  the  CSP  {V,V,t(C))  defined 
by: 

r{C)  ~  {C(r(ri ), rfro) . ’■(I'n))  suck  that  6'(vi,t>2 . rn)  €  C} 

We  will  speak  of  a  permutation,  instead  of  a  permutation  of  variables,  wher¬ 
ever  it  is  not  ambiguous  to  do  so. 

The  notion  of  symmetrical  constraint  is  the  usual  mathematical  notion  of 
symmetrical  relation : 

Definition  (symmetrical  constraint)  .4  constraint  of  arity  n  is  symmet¬ 
rical  if  for  any  permutation  r  of  the  intcijcrs  {1 . n}, 

Vai . a„  6  V.(a\ - o„)  £  rj^r)  —  (uth) - «r(„))  €  f.i'/(C') 

For  instance  the  constraint  i.s  symmetrical.  This  notion  is  the  basis  for  an 
equivalence  of  problems  witli  respect  to  .symmetries; 

Definition  (equality  with  respect  to  symmetries)  Given  a  constraint 
C,  two  literals  Ci  and  C'2  obtained  from  C  arc  equal  w.r.t.  symmetries,  noted 
Cl  —syrti  C2)  iff  either 
Cl  =  C2 

Cl  and  C2  bear  on  the  same  variables,  and  C  is  symmetrical 

We  are  interested  in  the  permutations  that  map  a  solution  in  a  solutii)n  ; 

Definition  (consistent  permutation  of  variables)  .4  permutation  r  is 
consistent  with  a  CSP  (V,  D,  C)  if  t(C)  =sy„,  C. 


This  consistency  notion  has  good  properties ;  any  combination  of  consistent 
permutations  i.s  a  consistent  permutation,  and  the  inverse  of  a  consistent  per- 


354 


mutation  is  also  consistent.  Thus  we  obtain  the  following  result  : 

Theorem  1  (group  of  consistent  permutations)  The  set  of  the  consis¬ 
tent  permutations  of  the  variables  of  a  C'SP  irith  the  composition  law  is  a  group. 

Definition  (symimitrical  CSP)  .1  .s\  luiiielilcal  C’SP  is  a  CSP  with  at  lea.sl 
one  consistent  permutation  other  than  the  tditilitg  pe  nnutution. 

The  simplest  exam[)le  of  a  symmetrical  C'.SP  is  the  permutation  C.SP  ; 

V  =  {u,}o<i<.v 
2?  =  {0,l,:..,yV-  1} 

C  =  (Vi,  j,  Q<i  <  i  <  N,Vi:f^Vj) 

If  =  3  the  problem  is : 

G  {0, 1,2} 

t^o  ^  V',  A  Vo  ^  t’o 

This  CSP  is  symmetrical.  Imleed  any  swap  ol'lwo  variabh's  gives  an  I'liuiva- 
lent  CSP.  For  instance,  on  the  3  variable  CSP.  if  we  swap  ii,  ainl  i  i  wi'  obtain 
the  following  constraint.s  . 

U]  Vo  A  Vo  ^  (■':<  A  ('2  V\ 

which  are  the  same  as  the  original  ones  w.r.l  symmetries. 

More  generally,  consider  the  transposition  of  two  variables  and  Vj ,  defined 
as  follows : 

Tij  =  Vi-*  Vj 
Vj  -*  Vi 

Vk  —  Vkifk  jb  i,k-  jb  j 

Such  a  permutation  is  consistent  with  tlu“  CSP.  smc<'  it  tloes  not  change  t  In¬ 
set  of  constraints.  But.  a  classical  alg(>braic  result  slates  that  any  permutation 
is  a  combination  of  such  transposit  ions.  'I  hus  by  theorem  1.  any  permutation  of 
the  variables  of  this  CSP  is  consistent. 

When  a  CSP  is  symmetrical,  any  consistent  permutation  of  the  variables 
define  a  permutation  on  the  solutions  of  the  CSP. 

4  Symmetrical  CSPs  without  solutions 

This  section  presents  the  fundamental  result  at  the  core  of  oiir  method,  and  then 
illustrates  it  on  a  simple  example. 

4.1  Valid  reduction  of  a  CSP 

A  simple  way  to  remove  symmetrical  solutions  to  add  constraints  to  P  m  urdei 
to  remove  permutations  of  .solutions.  'Ihis  can  be  formalised  a.s  follows; 


355 


Definition  (reduction)  A  CSP  {V,  D,  C’)  is  a  reduction  of  the  CSF  (V. 
D,  C)  ifCcC 

We  call  this  a  reduction  because  of  the  following  result  : 

Proposition  If  P  is  n  CSP  (iiid  P'  a  reduction  of  P.  then  tin  set  of  solutions 
of  P'  IS  included  in  the  set  of  solutions  of  P. 

When  considering  syinuielries.  we  look  for  reductions  tliat  do  not  remove  a 
whole  class  of  solutions,  i.e.  sucli  that  for  any  solution  a  of  tlie  (_'SP,  there  exists 
a  solution  of  the  reduced  (.'SP  which  is  a  permutation  of  a  ; 

Definition  (valid  reduction)  P’  is  <i  valid  reduction  of  CSP  P  if: 

P’  IS  a  reduction  of  P 

For  any  solution  a  of  P  there  exists  a  permutations  r,  of  the  variables  of  P 
consistent  with  P  such  that  a  is  u  solution  of  t^('P') 

Let  us  first  consider  the  3  variables  permutation  problem: 

■fo.  t'i.  Vs  €  {0.  1,2} 

t>o  ^  til  A  t'l  ^  Vo  A  Vo  5^  {■() 

A  valid  reduction  of  this  [uoblem  is  obtained  by  adiling  the  following  con¬ 
straints  : 

Vo  <  Vi  <  Vo 

This  is  indeed  a  valid  reduction.  L'or  instance,  consider  a  solution  of  the  CSP  : 
a  =  (i;o/2,  t>i/0,  (lo/l). 

This  solution  is  not  a  solution  of  the  reduced  CSP.  However,  consider  the 
permutation : 

Ta  =  Vo  —  Cu 

I'u  “  I  ' I 
t’l  -  Vo 

The  permuted  reduced  CSf'  has  tlie  following  constraiiils  ; 

Vi  V2  A  Vo  9^  no  A  uo  ^  I’l 

V\  <  V2<  Vo 

Now,  cr  is  a  solution  of  this  permuted  reduced  CSP !  By  finding  such  a  per¬ 
mutation  for  each  solution  of  the  CSP,  we  can  prove  that  the  reduced  CSP  is  a 
valid  reduction.  Let  us  do  this  for  the  general  case.  The  permutation  CSP  is  ; 

V  =  {t)i}o<,<A/ 

P  =  {0,1 _ N  -1] 

C  =  (Vi,y,  0  <  i  <  j  <  yV,  v,  vj ) 

We  add  the  following  set  of  constraints  : 


356 


C  =  (Vi,  j,  0  <  i  <  j  <  iV,  u,  <  fj  ) 


The  reduced  CSP  is  equivalent  to: 

V  =  {vi}o<i<N 

V  =  {0,l,...,iV  -  1} 

C  =  (Vi,  j,  0  <  i  <  j  <  N ,  Vi  <  Vj  ) 

Proposition  This  CSP  is  a  valid  nduchon  of  the  pc nuuiation  CSP 

The  proof  follows  the  idea  usetl  for  ;he  vanahle  example.  If  a  i.s  a  .solution 
of  this  reduced  permutation  CSP.  considi-r  the  following  lUM  imitaiioii : 

^<7  ~  '  Vi 

<7  is  then  a  solution  of  the  [rermuted  problem. 

We  have  just  proved  that  for  any  solution  a  there  exists  a  permutation  of  the 
variables  such  that  a  is  a  solution  of  the  permutation  C'SP,  'rims  the  reduced 
CSP  is  a  valid  reduction. 


4.2  Central  result 

The  notion  of  valid  reduction  is  very  us<'ful  liecause  of  the  following  results; 

Theorem  2  (sufficient  condition  lor  unsatisfiability)  A  CSP  has  no 
solution  if  there  exists  a  valid  reduction  of  Ihis  CSP  which  has  no  solution. 

Proof:  consider  a  CSP  V  and  a  valid  reduction  of  it  V  having  no  solutions. 
If  (T  is  a  solution  of  V,  then  T„(a)  is  a  solution  of  V' ,  which  contradicts  our 
hypothesis. 

Theorem  3  (A  symmetrical  probhun  has  a  valid  reduction)  .1  si/m- 
metrical  CSP  has  always  a  valid  reduction 

Proof:  Consider  a  syinmetrical  CSP  P  ~  (V.'P.C’).  Then  tlu'ri'  exists  at  least 
one  permutation  r  consistent  w'ilh  this  ('SP.  'I'hus  there  exists  a  variable  Vi  G  V 
such  that  T(vi)  =  Vj  ^  vi.  Then  one  obtains  a  \alid  reduction  V  of  P  by  tuhling 
the  constraint  r,  <  Vj.  Indeed,  if  cr  is  a  soltition  of  P,  such  that  o-(r,)  >  (t(  (■_,), 
then  (T  is  a  solution  of  t{V}. 

These  results  are  at  the  core  of  our  iiii’thod  :  the  ideal  goal  is  to  be  able 
to  compute  a  non  symmetrical  valid  rerluction  of  any  symmetrical  problem  by 
adding  ordering  constraints.  If  the  reduction  is  not  satisfitible,  then  tin'  miginal 
problem  is  not  satisfiable  either.  However  the  nnsatisliability  proof  is  I'asier  on 
the  reduced  problem  since  it  hms  a  smaller  number  of  potential  solutions. 


357 


4.3  The  pigeonhole  CSP 

We  first  show  how  the  preceding  results  can  be  used  on  a  simple  problem:  the 
pigeonhole  problem.  This  problem  can  be  viewed  as  a  CSF  by  associating  one 
variable  to  each  pigeon,  the  value  of  which  being  the  hole  where  the  pigeon  is 
placed.  The  formal  description  of  the  |)roblem  is  then  similar  to  t  he  one  presented 
in  previous  section  with  one  additional  variable: 

V  =  {yi}o<i<N 

P=  {0,1,:  ~,N  -  1} 

C  =  (Vi,  j, 0  <  i  <  j  <  A'.  I,  /  ij) 

The  pigeonhole  CSP  has  obviously  no  .solution.  It  is  also  a  symmetrical  CSP 
as  can  be  shown  by  the  same  considerations  that  w£is  used  for  the  permutation 
CSP.  The  valid  reduction  is  obtained  by  adding  the  same  ordering  constraints  : 

V  =  {Vi}o<i<N 

V-  {o,i,’..,yv-  1} 

C  =  (Vi,  j,  0  <  i  <  j  <  N,  Vi  <  Vj ) 

Proposition  T/it.s  CSP  ts  n  valid  reduciton  of  the  pigeonhole  CSP. 

The  proof  is  similar  to  l  ln-  one  for  the  pt'rmut.ation  CSP 

The  reduced  pigeonhole  CSP  is  t'asily  shown  to  lx;  nn.solvable,  since  local 
consistency  is  stifficient  to  detect  th<“  impo.ssibility  This  proves  that  the  original 
CSP  is  also  unsatisfiable  using  theorem  2. 

5  The  Ramsey  problem 

We  will  now  consider  a  much  more  difficult  problem  which  has  been  first  solved 
using  our  technique  .see  [11],  namely  the  Ramsey  problem.  Consider  the  complete 
graph  with  n  nodes  (each  node  i.s  connected  to  every  other  node).  The  problem 
is  to  color  the  edges  of  this  graph  with  .3  colors,  such  that  for  any  3  nodes 
nl,n‘2,n3,  the  three  arcs  (nl,  n2)(n2,  n3)(n3,  »tl )  do  not  have  all  three  the  same 
color  (but  two  of  them  can  have  the  same  color).  For  N  =  16.  this  problem  has 
a  lot  of  solutions.  For  N  —  17  there  is  no  .solution.  It  is  not  very  difficult  to  write 
a  program  for  finding  solutions  for  N  <  16.  The  challenge  is  to  write  a  program 
which  solves  the  problem  for  N  <  16,  and  proves  that  there  is  no  solution  for 
N  =  17. 

5.1  The  cardinality  constraint 

In  the  previous  problem  we  hav('  considered  simple  orderings  bearing  on  the 
values  taken  by  the  variables  We  will  now  u.se  another  kind  of  ordering  which 
may  be  very  useful  for  removing  symmetries.  These  orderings  uses  the  cardinal¬ 
ity  constraint  defined  in  [f  Ij.  f  his  constraint  is  defined  a.s  follows. 


358 


If  {a;o,  xi, .  .  , ,  i'„}  is  a  finite  subset  of  V.  and  iiuni  is  an  integer  variable  : 
num  —  count(d,  {xq,  Xi , . . . ,  x,,}) 

states  that  the  number  of  variables  in  {xg,  Xi, . . . ,  x„}  taking  the  value  d  is 
equal  to  the  value  of  the  variable  nuiii. 

proposition  The  count  constraint  is  symmeincaL 

This  constraint  is  less  general  than  Van  llentenryck’s  [15]  cardinality  ojiera- 
tor,  but  is  more  efficient  as  shown  in  [II] 


5.2  The  symmetrical  CSP 

The  Ramsey  problem  can  be  encoded  with  one  variable  representing  the 
color  of  the  edge  from  node  i  to  node  j  The  (ISP  is  thus : 

G  {0,  1,2} 

Vz,J,  ^  j  ^  ^  V  ^ k ,i  V  ^ 

This  representation  is  symmetrical  since  the  count  constraint  is  symmetrical : 
two  nodes  can  be  swapped.  The  corres|)onding  permutations  are  the  followings: 
Pi,j  =  Si,k  *  Cj^ic 

^j,k  *■  ^i,k 

ek.t  —  Ckjifh.l  ^  {/,y} 

By  theorem  1,  any  permutation  of  the  nodes  yield.s  a  consistent  variable 
permutation. 


5.3  The  valid  reduction  of  the  CSP 

In  order  to  reduce  this  |)roblem  using  onleriiig  constraints,  one  must  find  (pian- 
tities  that  are  changed  by  one  of  these  swaps.  One  such  quantity  is  the  number 
of  edges  going  from  one  node  i  having  a  given  color  color.  Let  us  call  this 
numcoior{i,  color).  By  swapping  nodes,  one  can  obtain  a  solution  where  the 
number  of  edges  with  color  0  is  the  biggest  among  the  uiiiiicolor(i ,  color)  .More¬ 
over,  one  can  suppose  that  the  values  of  r,  |  are  increasing. 

The  reduced  CSP  is  then 
{ej,j}o<i.j<rv-i  G  {0, 1,2} 

f  J  ^  ^  ^iyk  V  Ci  jfc  ^  ^k.i  V  Cjti  ^  Cj  y 

Vi ,  y ,  e j  —  ^jyi 

Vi,  colornumcolor(i,  color)  =  count(color,  {e^  o,  , . .  . ,  e;  /v-i } ) 
'ii,colornumcolor{0,0)  >  7iurncolor(i,0) 

^0,1  <  Co, 2  <  ■  •  ■  <  P,..V-2  <  Co,,V-l 


359 


Preposition  This  C'SP  is  a  t  uliit  inltichuu  uf  Hu  orujiual  Uauisty  CSP 

Proof.  If  a  is  a  solution  of  tin*  Ramsey  CSP,  lei  I'o  be  the  rank  of  tlie  node 
with  the  biggest  number  of  edges  of  colors  0.  Then  by  definition,  we  have  that ; 
'ii,  color,  numcolor{io,Q)  >  nuincolor(i,  color) 

Then  we  can  permute  the  other  nodes  so  that  edges  starting  from  node  t'o 
have  increasing  colors.  The  fact  that  any  node  permutation  yields  a  consistent 
permutation  of  the  variables,  terminates  the  proof. 

This  reduced  CSP  still  has  some  .-'ymmeti  s.  However  the  number  of  syne 
metries  is  sufficiently  small  for  the  [noblem  tij  oe  solveil  m  a  n  asoiialtle  amount 
of  time, 

6  Concluding  remarks 

6.1  Related  work 

The  starting  idea  of  our  metliod  is  to  remove  .symmetrical  .sohit ions  by  cliangmg 
the  CSP  to  be  solved.  Othei'  recent  work  concent rtiti's  on  a  difh.'rent  idea  ;  how 
can  we  design  a  smarter  CSP  .-.olver  tlnit  would  tivoid  the  testing  of  permntations 
of  a  subproblern  solution  if  it  leads  to  a  lailure.  'fliis  apjtroacli  has  been  stndii'd 
independently  in  [1]  and  [13],  In  both  approaches  the  solver  analyses  the  CSP 
when  a  failure  happens,  and  tries  to  recognizi'  symmetrie.s  of  the  C'SP  related  to 
the  failure.  In  other  words,  they  ari*  looking  for  symmetrii's  that  letive  the  cause 
of  the  failure  unchanged,  fhen  tin-  solvi'r  avoids  ti'sting  symmetrical  domain 
assignments. 

The  main  difficulty  with  these  aitproaches  is  the  recognition  of  symmetries. 
Solving  this  recognition  problem  in  its  most  general  form  is  at  least  as  difficult 
as  proving  the  nnsatisfiability  of  a  symmetric  CSP.  Thus  both  papers  projiose  an 
approximate  solution:  good  heuristics  that  search  for  most  commonly  encoun¬ 
tered  symmetries  are  used.  Using  the.se  a|)proaches.  the  Hainscy  17  problem  has 
been  solved  independanlly  from  us 

VVe  think  that  our  approtuli  is  more  edicient.  although  less  automated,  be¬ 
cause  we  reduci'  the  total  nniiiber  ol  potent  nd  solutions  before  se.iiclunii  a  solu¬ 
tion.  'flu’  next  section  provides  evidence  for  this  claim 

6.2  Results 

This  work  has  been  done  during  the  imph'iiientation  of  a  constrained  program¬ 
ming  library  described  in  [5]  called  PECO.S.  I'his  library  provides  support  for 
constrained  variables,  domains,  and  constraints  as  described  in  this  paper.  It  is 
available  either  as  a  Lisp  library  or  ns  a  C-f -f  library 

Another  CSP  is  used  as  a  benchmark  for  symmetries:  'fhe  Schur  problem. 
One  has  to  place  a  given  number  A'  of  balls  in  3  bo.xes.  fhe  balls  are  numirered 


360 


from  1  to  13.  I'lie  ball  iiuruber  /  ami  llic  liall  mimbiTtal  ‘2i  caiiiioi  be  m  ilu'  '-am. 
box.  The  balls  numbered  i  j  and  /  +  j  cannoi  be  all  tbre<'  in  ilie  .same  iK>.\  \\  iib 
/V  =  13  there  is  a  solution,  aiul  theia-  is  nc>  .soliiiion  for  .V  =  11. 

All  the  CSP  discussed  in  this  paper  have  been  inii)lemented  using  PLiCOS. 
We  give  cpu  time  needed  to  solve  with  the  (^++  version  on  a  sjiarcstation  2  these 
CSPs.  Times  for  the  other  two  approaches  are  taken  from  [13]  and  [1]  vvdio  botli 
use  a  spare  workstation.  Thus  times  can  be  com|rared.  d'licy  are  given  in  seconds. 


PKX  :os 

[1^5] 

[1] 

ramsey  M  (one  solution) 

'  1.07 

4 

‘2.S2 

ramsey  15  (one  solution) 

2..')7~n 

■12  j 

50.73 

ramsey  lb  (one  solution) 

■S.'S 

■50.7ti 

n5 

ramsey  17  (proof  of  uiisai  islialuliiy ) 

1.51 

30 

scluir  13  (one  solution) 

U.U2 

u.ib 

0.05 

schur  14  (proof  of  unstitisfialiilit.y) 

0.14 

O.bb 

!  33 

6.3  Summary  and  future  work 

We  have  studied  a  method  for  removing  .symmetries  in  C'Sl'.  I'liis  metlioil  relies 
on  a  theoretical  analysis  of  the  problem,  based  on  ;i  study  of  the  consistent 
permutations  of  the  variable.s  of  a  prol.dem.  We  then  presented  a.gtnei’al  nnUhod. 
based  on  the  notion  of  valid  reductions  of  (  'SP.  1  his  teclinu|ue  can  be  used  to 
solve  hard  CSP  such  as  the  Ramsey  CSIb 

However  our  technic[ue  is  not  fully  automated  since  a  valid  reduction  has  to 
be  chosen.  We  think  that  the  be.st  ajiproach  to  study  would  be  a  combination  of 
our  technique  together  with  the  one  described  in  [13]  and  [1]: 

-  the  first  stage  would  be  the  recognition  of.symmotriesof  the  CSP  by  analysing 
the  constraints  of  the  CSP.  This  step  would  beiielit  from  the  heuristics  used 
in  [13]  and  [1]. 

-  The  second  stage  would  bo  an  automatic  generation  of  additional  ordering 
coristraiiits  yielding  valid  rednctioiis  of  the  ( '.SP 

1  heorem  3  suggi'sts  that  the  seioml  >l(  p  is  udI  lotailv  a  dream 

References 

1 .  Ben  liainou  B. .  a  ml  Sai.s  1. .  '■  Ilu-Dril  u  al  >l  ml  v  ul  >\  inim  1 1  lo  m  |im|'u>il  ion  a  I  caleiil> 
and  applications”,  in  ijiucmlinij.-;  of  ('ADI,  .Upjnne  U-’. 

('a.sea'1  3  .  I’uj'et  If.  '( ’oust  i  ainl>  on  Ordt  i  Soiti'd  Dor  ains’  .  suliiiiilid 
1  Dfville,  3  ..  ami  Van  ilenlenrvck,  P,  ".Xn  Kiln  imt  .Xie  ( 'onsisteiu  v  .-\l^;oi il  li in  for  a 
(dass  of  ('SB  Pi'oblein..'  ,  in  jnoi  1 1  ilimjs  of  ID'AI  l‘)At.  jip  ‘IJ’i-'I'IIK 
1  ''bisgeii,  llert/lieri'  "S  jiup  fiindaiiienial  piopi-i  lie...  ol  local  propasalion  nictlioUs' 
.-Irt  Ini  pp  l'.V7-c',f  7.  I 


361 


5.  lion,  PECOS  y .  S?  )  t/(  ( I  (K  I  iiKiiiijiil.  ilntiiiljti 

6.  Jaffar  J.,  and  Lassez  J.-L  C'oiisliainl  l.unit  Hrograiiiiiung,  in  proceedings  ol  1H,)P1, 

I  1987,  Munich,  January  87, 

7.  Lanriere  J.L.,  A  language  and  a  I'logiaui  tor  .Stating  and  Solving  C loiuliinalorial 
Problems,  Aid.  Int.  10(1). 

8.  Mackworth  A.K.,  ‘'Consistency  in  networks  of  relations”.  Ail.  Int  S,  pjj  99-llS, 

i  1977 

I  9.  Mohr,  Henderson  “Arc  and  path  consistency  revisited”  Aid.  Ini.  28.  pp  128-233, 

I  1986. 

10.  Mohr,  Masini,  “Good  Old  Di.scretc  Relaxation”,  m  proceedings  of  ECAI  1988. 

11.  Puget  J.-F,,  “Pecos  :  a  High  Level  Constraint  programming  Language”  in  proceed¬ 
ings  of  SPICIS  92,  Singapore,  Sept  92. 

12.  Puget  J.-F.,  “Programmation  par  contraintes  oriente’e  objet”  in  proceedings  of 
Tenth  international  conference  on  i.rpert  systems  and  applications,  Avignon.  June 
92  (in  Fretich). 

13.  San  Miguel  Agnire,  “HOW  to  use  symmetries  in  Boolean  constraint  solving”  in  .\. 
Colmerauer  and  F.  Benhamou,  editors.  Selected  Papers  on  Constraint  Logic  Pro¬ 
gramming  (lo  appear)  MIT  Press. 

14.  Van  Hentenryck,  P.,  Constraints  Satisfaction  in  Logic  Programming.  MIT  press, 
1989. 

(  15.  Van  Hentenryck,  P.,  Deville  Y..  “The  carditial  operator  ;  A  new  logical  connective 

for  constraint  logic  progratnttiing”  in  proceedings  of  ICLP  91.  pp  7.)i>-  759. 


i 


J 


A  Logical  Reconstruction  of  Constraint 
Relaxation  Hierarchies  in  Logic  Programming 

Allen  L.  Brown,  Jr. 

Surya  Mantha 
Toshiro  Wakayama 

Webster  Research  Center,  Xerox  Corporation 


Abstract.  We  propose  an  extension  to  Definite  Horn  Clauses  by  placing 
partial  orders  on  the  bodies  of  clauses.  Such  clauses  are  called  relaxable 
clauses.  These  partial  orders  are  interpreted  as  a  specification  of  relax¬ 
ation  criteria  in  the  proof  of  the  consequent  of  a  relaxable  clause,  i.e. ,  the 
order  in  which  to  relax  the  conditions  of  truthhood  of  the  consequent  if 
all  the  goals  in  the  body  cannot  be  satisfied.  We  present  a  modal  logic  of 
preference  that  enables  us  to  characterize  these  preference  orders,  both 
syntactically  and  semantically.  The  richer  structure  of  the  modal  prefer¬ 
ence  models  reflects  these  preference  orders;  something  that  is  absent  tn 
the  essentially  flat  structure  of  traditional  Herbrand  models.  A  roriont 
of  SLD-resolution  that  generates  solutions  in  the  preferred  order  is  pre¬ 
sented.  The  notion  of  control  as  preference  is  introduced  as  a  first  step 
towards  specifying  control  information  in  a  logically  coherent  fashion.  Re¬ 
laxable  Horn  clauses  can  be  used  to  succinctly  specify  constraint  problems 
in  formal  design.  It  is  worth  noting  that  the  development  of  preference 
logic  was  driven  by  the  desire  to  characterize  declaratively,  problems  in 
document  layout.  In  [4]  give  a  completely  declarative  account  of  the 
stable  models  of  a  general  logic  program.  The  reader  is  referred  to  [3j, 
[Sjand  [14]  for  a  detailed  account  of  nonmonotonicity  as  preferential  rea¬ 
soning,  the  soundness  and  completeness  proofs  for  the  logics  and  appli¬ 
cations  to  Artificial  Intelligence,  such  as  deontic  reasoning. 


1  Motivations  for  Relaxable  Specifications 

One  of  the  many  attractions  of  logic  programming  is  that  it  is  an  excellent 
executable  specification  language.  Corresponding  model-theoretic,  fixed-point 
and  operational  semantics  (via  SLD-resolution)  ensure  that  an  atomic  fact  is  in 
the  denotation  of  a  logic  program  if  and  only  if  it  is  in  the  success  set  of  the 
program.  The  least  Herbrand  model  of  a  logic  program  fully  specifies  the  state  of 
affairs,  so  to  speak.  Insofar  as  the  purpose  of  a  formal  specification  is  to  spell  out 
conditions  under  which  certain  statements  are  true  and  also  those  statements 
that  are  actually  true,  the  language  of  Horn  clause  logic  is  sufficient  (clauses  and 
atomic  facts  do  the  respective  jobs).  At  its  core,  however,  a  logic  programming 
interpreter  is  a  theorem  prover,  and  this  is  manifested  in  the  binary  nature  of  the 
query  mechanism.  Given  an  atom  eis  a  query  q  and  a  specification  (or  program) 
P ,  the  system  returns  true  if  9  is  a  logical  consequence  of  P  and  either  returns 


false  or  loops  otherwise.  A  desirable  feature  of  a  formal  specification  system  is 
that  it  should  be  possible  to  relax  the  requirements  on  the  truth  of  a  statement 
in  a  disciplined  and  semantically  coherent  way.  The  relaxation  regime  should  be 
specifiable  in  a  totally  declarative  fashion. 

Before  proceeding  any  further,  we  must  formalize  the  notion  of  a  constraint. 
For  our  purposes  it  is  sufficient  to  view  a  constraint  as  a  relationship  among 
objects,  or  in  other  words,  a  predicate.  We  shall  take  a  constraint  to  be  a  declar¬ 
ative  description  of  a  relation  along  with  methods  to  satisfy  (or  enforce)  the 
relationship.  These  methods  could  be  specified  by  procedures  or  declaratively, 
using  Horn  clauses.  We  shall  v'ork  with  Horn  clauses.  Consider  the  example  in 


A  6  C 


Fig.  1.  A  Relaxable  Constraint 


Figure  4.1.  The  system  has  four  hard  constraints:  A  +  B  =  C,  A  >  0,  B  >  0, 
and  C  >  0.  The  first  constraint  expresses  the  fact  that  the  amount  of  a  certain 
fluid  F  in  container  C  is  a  sum  of  the  amounts  in  A  and  B.  The  others  state 
that  the  minimum  amount  of  F  that  each  container  can  hold  is  bounded  below 
by  0.  Let  the  overall  constraint  expressed  by  this  example  be  ADD(A,  B,  C), 
and  the  rule 


364 


ADD(A,  B,  C)  :  -  A  B  =  C,  A  >  Q,  B  >  Q.  C  >  0. 

Suppose  Figure  1  actually  appears  on  the  computer  screen  with  the  cursor  at 
A.  If  we  drag  the  cursor  up  -  the  semantics  of  this  action  being  that  more  fluid 
is  poured  into  A  -  how  should  the  system  react  in  order  to  satisfy  the  constraint 
ADD.  As  it  stands,  the  system  is  underconstrained.  It  could  either  reduce  the 
amount  of  fluid  in  B  or  increase  the  amount  in  C.  We  would  prefer  that  C  go 
up  as  A  goes  up  (represented  by  the  constraint  INCREASING(A,  C),  whose 
definition  we  shall  not  worry  about  here).  Suppose,  we  have  a  preference  as 
to  the  amount  of  fluid  that  container  C  should  ideally  hold,  say  less  than  or 
equal  to  10  units.  These  two  constraints  are  not  hard  constraints,  in  the  sense 
that  they  need  not  be  satisfied  under  all  circumstances.  However,  we  prefer 
C  <  10  to  INCREASING{A,  C)  because  of  reservations  about  the  strength 
of  the  container  C  (for  instance!).  Assume  that  the  system  starts  in  the  initial 
state  (6,  2,  8)  i.e.,  A  has  6  units,  B  2  and  C  8.  By  increasing  A  by  1,  the 
resulting  state  of  the  system  is  described  by  the  vector  (7,  2,  9).  The  system 
was  able  to  satisfy  all  the  six  constraints  on  it.  On  further  increasing  A,  we  get 
the  following  trace;  (8,  2,  10),  (9,  1,  10),  (10,  0,  10),  (11,  0,  11).  In  the  change 
from  the  vector  (8,  2,  10)  to  (9,  1,  10),  the  system  was  not  able  to  satisfy  all 
the  constraints.  Because  C  <,  10  was  preferred  to  INCREASING(A,  C),  the 
latter  was  sacrificed.  The  last  vector  shows  a  state  where  C  <  10  could  not  be 
satisfied.  The  fact  that  INCREASING{A,  C)  is  now  resatisfied  is  incidental. 

How  does  one  represent  such  knowledge?  A  rather  attractive  way  that  sug¬ 
gests  itself  is  the  following. 

ADD{A,  B,  C)  *- 

required:  A  +  B  =  C, 

required:  A  >  0, 

required:  B  >  0, 

required;  C  >  0, 

prefer:  C  <  10, 

default;  INCREASING{A,  C). 

This  appealing  syntax  where  the  goals  in  the  body  are  annotated  with  their  re¬ 
spective  strengths  has  been  proposed  in  the  literature  [2].  Such  formalisms  are, 
however,  ad  hoc  because  they  do  not  give  a  declarative,  model-theoretic  seman¬ 
tics  for  such  clauses.  The  logical  status  of  such  annotations  is  left  unspecified. 
It  should  be  noted  that  these  proposals  are  an  extension  of  the  CLP  framework 
and  they  attach  annotations  only  to  constraints,  and  not  to  all  the  goals  in  the 
bodies  of  clauses.  However,  viewed  as  formulae  in  logic,  they  do  not  denote.  We 
propose  a  more  general  mechanism  whereby  preference  orders  are  imposed  on 
all  the  literals  in  the  body  of  a  clause. 

2  Relaxation  Hierarchies  in  Logic  Programming 

Our  stated  goal  is  to  devise  a  language  of  relaxable  specifications  in  which  the 
relaxation  criteria  have  due  status  in  the  model  theory.  Drawing  inspiration 


365 


from  the  annotated  (with  strengths  such  as  required)  concrete  syntax  presented 
above,  we  define  a  new  category  of  clauses  called  retazable  clauses. 

Definition  1.  A  relaxable  clause  is  a  triple  (H,  B,  P),  where  H  is  an  atom,  B 
is  a  set  of  atoms  and  F  is  a  partial  order  on  B. 

Thus,  an  ordinary  clause  is  a  relaxable  clause  with  a  an  empty  partial  order. 

Definition2.  A  relaxable  logic  program  is  a  finite  set  of  relaxable  clauses. 

Definitions.  If  P  is  a  partial  order  over  a  set  S,  then  an  element  x  of  5  is 
maximal  in  P  iff  there  is  no  element  y  in  5  such  that  x  <  y. 

Definition4.  Let  C  =  {H,  B,  P)  be  a  relaucable  clause.  The  minimal  support 
set  of  H  in  the  context  of  C  is  max(F),  where  max< P)  is  the  set  of  maximal 
elements  in  the  partial  order  F. 

What  is  the  informal  reading  of  a  relaxable  clause?  An  atomic  query  Q  can  be 
proved  using  a  relaxable  clause  C  =  {H,  B,  P)  of  a  relaxable  program  F,  if  Q 
unifies  with  H  with  6  being  the  mgu,  and  at  least  the  maximal  elements  of  F, 
i.e  6{max{P))  can  be  proved  using  R.  Ideally,  we  would  like  all  of  the  goals  in 
t'  body  B  to  be  provable.  If  that  is  not  possible,  the  partial  order  F  specifies 
the  order  in  which  the  body  can  be  relaxed.  The  satisfaction  of  a  goal  is  of  higher 
priority  than  of  all  the  goals  that  are  lower  than  it  in  the  partial  order.  Of  course, 
the  same  information  can  be  used  for  goal  selection  by  the  SLD  interpreter,  but 
more  on  that  later.  Consider  the  following  (relaxable)  logic  program  GPi  (for 
goal  preference)  with  four  clauses. 

ipiX),  {?,  r(A),  s(X)},  {s(A:)  <  q,  s(X)  <  r(X),  r{X)  <  q}) 

{q,  -,  -).  (s(2),  -,  -).  (r(l),  -,  -). 

What  is  the  behavior  of  the  above  program  given  the  goal  <—  p{Y)?  Since  the 
goal  unifies  with  the  head  of  the  first  clause  (and  the  only  clause  for  p),  we  would, 
ideally,  like  to  satisfy  all  the  goals  in  the  body.  However,  the  satisfaction  of  the 
maximal  elements  -  {?}  in  this  case  -  is  sufficient.  Thus,  both  p(l)  and  p(2)  are 
provable.  The  relaxation  criteria,  however,  put  a  preference  order  over  the  two 
solutions  to  this  query.  Because  r{X)  is  higher  than  s(X)  in  the  partial  order, 
the  solution  p(l)  will  be  preferred  to  the  solution  p(2).  Can  a  similar  behavior 
be  achieved  using  definite  Horn  clauses?  A  first  approximation  is  the  following 
logic  program  CPi  (for  clause  preference). 

p{X)  -  q,  r(X),  s(X).  p{X)  ^  q,  r{X).  p(X)  -  q,  s(X). 

p(X)  ^  q.  q.  s{2). 

r(l). 

The  information  about  the  partial  order  in  the  first  clause  of  GPi  is  cap¬ 
tured  by  the  textual  order  of  the  p-clauses  in  CPi.  We  shall  call  this  the  clause 
preference  translation. 

Definition  5.  The  clause  preference  tr^.ns'.ation  of  a  relaxable  program  GP\  is 
the  union  of  the  ordered  sets  obtained  bj  the  clause  preference  tran  lation  of 
each  relaxable  clause  in  GFi. 


366 


We  have  a  total  order  in  our  example,  A  partial  order  is  more  cumbersome 
to  represent  using  the  try  order  of  clauses.  Moreover,  this  information  has  no 
status  in  the  least  Herbrand  m„.del  of  CPi  (if  CPi  is  considered  to  be  an  or¬ 
dinary,  unordered  collection  of  clauses)  -{p(l),  p(2),  q,  s(2),  r(l)}  -  which  is  a 
flat  structure.  The  fact  that  p(l)  is  preferred  over  p(2)  is  not  represented  in  the 
semantics.  It  is  solely  a  consequence  of  the  textual  order  of  the  clauses.  It  is  this 
added  intensionality  that  we  would  like  to  capture  explicitly  in  a  logic. 

Another  possible  translation  of  the  relaxable  logic  program  GPi  is  into  PP\ 
(for  program  preference),  a  set  of  four  logic  programs  {7r,|  1  <  *  <  4}.  The  set 
of  clauses  {q,  s(2),  r(l)}  is  common  to  all  the  ir^.  They  differ  in  the  definition 
of  the  predicate  p,  and  has  the  first  p-clause  of  CP\  le-i  the  second  and  so 
on.  We  now  have  four  competing  logic  programs  to  choose  from  when  solving  a 
query.  What  is  required  to  complete  the  translation  is  a  fifth  component  that 
places  a  preference  order  among  these  programs  -  an  arbiter,  so  to  speak.  The 
framework  of  logic  programming  is  not  rich  enough  to  enable  us  to  express 
such  preferences  and  thus  complete  the  translation  into  PP\  The  reader  will 
have  noticed  that  in  the  course  of  these  translations  what  has  been  revealed  is 
that  relaxable  specifications  can  be  looked  upon  as  either  goal  preference,  clause 
preference  or  program  preference  logic  programs,  where  the  partial  order  is  on 
the  goals,  clauses  and  programs  respectively. 

3  Preference  as  a  Modality 

The  notion  of  preference  is  fundamental  to  computing.  As  should  be  obvious,  the 
notion  of  minimality  derives  from  it,  and  the  search  for  minimality  is  a  recurring 
theme  in  artificial  intelligence  and  theoretical  computer  science.  It  has  been 
studied  quitely  extensively  in  such  diverse  areas  as  decisian  theory,  operations 
research,  economics,  ethics,  and  philosophical  logic  [11]  [8],  [15],  [16],  [7],  [9]. 

As  a  first  approximation,  a  logic  of  preference  should  concern  itself  with 
the  study  of  preference  principles  acceptable  upon  abstract  and  formal  grounds 
rather  than  upon  any  particular  theory  of  preferability  determination.  Compet¬ 
ing  -  and  mutually  inconsistent  -  theories  of  preference  have  been  proposed  by 
economists,  mathematicians  and  philosophical  logicians  among  others.  The  work 
in  economics  and  utility  theory  [11],  [13]  relates  qualitative  notions  of  preference 
to  quantitative  notions  of  probability  and  desirability,  but  places  rather  stringent 
conditions  on  the  preference  relation,  i.e.,  that  it  be  an  irreflexive,  asymmetric 
and  transitive  relation.  In  fact,  much  of  the  controversy  and  debate  in  this  whole 
area  has  been  around  the  question;  what  are  the  logical  properties  of  preference? 
Much  has  been  written  for  and  against  the  transitivity  of  preference  [10],  [6].  Even 
such  a  seemingly  innocuous  property  as  asymmetry  has  come  under  criticism  [1]. 

Elsewhere  [5],  [14]  we  have  argued  about  what  the  right  granularity  of  pref¬ 
erence  should  be.  In  almost  all  work  in  philosophical  logic,  utility  theory  and 
economics,  preference  is  taken  to  be  a  binary  relation  among  propositions,  i.e., 
those  objects  that  can  be  represented  by  sentences  of  the  underlying  logical  lan¬ 
guage.  The  scope  of  preference  is  thus,  extemely  local.  Having  committed  to  such 


367 


a  fine  level  of  granularity,  one  has  to  make  rather  strong  commitments  as  to  what 
logical  properties  the  preference  relation  enjoys  under  all  circumstances.  The  use 
of  preference  in  nonmonotonicity  has  seen  a  drastic  shift  from  the  very  local  to 
the  very  global.  Thus,  in  circumscription,  we  talk  about  truth  in  all  minimal 
models.  The  inference  rules  of  default  logic  and  other  nonmonotonic  logics  ap¬ 
peal  to  certain  global  condtions  as  well.  Such  a  shift  has  had  a  disastrous  impact 
on  the  computability  of  preference.  It  is  ironic  that  this  should  have  happened 
in  computer  science,  of  all  disciplines. 

Consider  the  preference  proposition  pPq.  According  to  the  usual  reading, 
it  means  p  is  preferred  to  g.  It  can  be  claimed  reasonably  that  p  and  q  do  not 
obtain  in  a  vacuum  after  all.  Thus,  it  means  that  p  js  preferred  to  q  independent  of 
everything  else,  or  that  p  is  preferred  to  q  given  that  everything  else  is  the  same, 
or  some  intermediate  point  between  these  two  extremes.  In  our  view,  the  binary 
preference  relations  between  propositions  are  secondary  and  can  be  derived  from 
the  preference  orderings  among  possible  worlds.  The  fundamental  relation  of 
preference  is  among  possible  worlds,  i.e.  those  objects  that  are  described  by  a 
possibly  infinite  collection  of  propositions.  What  is  important  to  keep  in  mind  is 
that  we  do  not  take  individual  sentences  of  the  logical  language  to  describe  the 
states  of  affairs  over  which  the  preference  relation  ranges.  This  makes  it  difficult 
to  characterize  the  preference  relation  using  a  binary  relation  P,  because  it  is 
not  convenient  to  explicitly  talk  about  the  referents  of  this  relation.  We  need  a 
syntactic  means  of  characterizing  these  preference  orders  among  possible  worlds, 
in  short,  we  seek  a  modal  operator  for  preference  that  would  play  the  role  that 
□  does  in  alethic  modal  logic. 

3.1  Logic  of  Feasible  Preference  Vi 

Vi  is  a  modal  logic  of  two  relations  that  interact  with  each  other.  The  motiva¬ 
tion  for  this  interaction  is  to  capture  the  intuition  that  in  order  to  get  to  the 
optimal  (or  best)  world,  one  needs  to  be  able  to  talk  about  worlds  that  are  fea¬ 
sible  from  the  standpoint  of  the  current  world.  The  motivation  underlying  this 
whole  enterprise  is  to  devise  a  formal  language  and  logic  in  which  optimization 
problems  can  be  stated  precisely.  Thus,  if  W2  is  feasible  from  wi,  and  wi  <  W2 
and  102  2^  wi,  then  it  is  possible  to  move  from  the  solution  wj  to  u>2.  If,  however, 
W2  were  not  feasible  from  wi,  then  it  would  not  be  possible  to  move  from  ii,'i 
to  W2  even  if  tU2  were  preferred  to  inj.  This  interaction  between  the  two  rela¬ 
tions  is  fundamental  to  modeling  any  situat’  r.  that  is  of  computational  interest; 
in  particular  all  search  based  computation.  Drawing  an  analogy  from  classical 
mathematics,  feasibility  corresponds  to  continuity  of  the  domain  of  computation 
(just  as  preference  corresponds  to  maximization). 

Syntax  :  We  add  to  the  language  Cm  of  a  normal  modal  logic  -  equipped  with 
the  modal  operators  O  and  O  -  a  new  unary  modal  operator  Pj  and  its 
associated  formation  rule,  i.e.,  if  A  is  a  formula  then  PjA  is  a  formula. 
Semantics  :  A  Vi  preference  frame  Ad  is  a  triple  of  the  form  (W,  TZ,  <)  where 
W  and  TZ  are  as  in  standard  Kripke  frames  and  is  a  binary  relation  over 


W  X  W  which  is  a  subset  oiTZ.  kV\  preference  model  is  a  Pi  preference  frame 
with  a  valuation  function  V  that  determines  the  truth  of  atomic  formulae 
at  individual  worlds.  Assuming  the  usual  valuation  of  formulae  at  possible 
worlds,  the  semantics  of  the  modal  operators  are  given  by: 

-  DF  iffVu€  WiuPu  —  t=X,  P 

-  P/P  iff  Vu  e  w  t=X<  P  A  wHv  ->-w:<v 

Axiomatics  :  We  cissume  that  Pi  is  equipped  with  all  the  axiom  schemes  and 
rules  of  standard  propositional  logic,  and  the  normal  modal  logic  A.  In  ad¬ 
dition,  we  have  the  following  axioms  and  rule. 

PPS  :  h  D-nA  —  PyA 
PIR  :  I-  PyA  -  -A 
T  ;  h  QA  -*  A 

PI  if  h  (Ai  A  •  ■  -  A  An)  — ►  A  then  h  (Py->Ai  A  •  ■  ■  A  Py-«An)  — *  Pj-'A  for 
n  >  0 

PIR  is  valid  in  the  class  of  preference  frames  with  an  irreflexive  preference 
relation.  T  is  valid  in  the  class  of  preference  frames  with  a  reflexive  feasibility 
relation.  PIR  and  T  are  needed  to  show  the  completeness  of  V\ .  Some 
axioms  and  rules  that  are  valid  are 

PI*  PfiA  —  B)  ^  Pyfl.  PA  (PyA  A  PjB)  —  Py(A  A  B). 

PF  Pyl.  PGN  PyA  —  Py(A  A  B). 

PST  Py(A  V  B)  Py(A).  PEQ  if  h  (A  ^  B)  then  I-  (PyA  ^  PyB) 

The  language  of  Pi  is  rich  enough  to  allow  us  to  express  general  preference 
principles.  We  refer  the  reader  to  [14],  [5]  for  details.  In  the  rest  of  this  paper,  we 
work  with  the  first-order  version  of  Pi  with  fixed  domains  across  worlds,  terms 
cis  rigid  designators. 

4  Horn  Preferential  Theories 

Central  to  our  enterprise  will  be  the  notion  of  a  preferential  theory.  We  first 
give  some  informal  motivational  remarks.  Consider  the  typical  structure  of  an 
optimization  problem.  The  following  components  can  be  immediately  identified: 
a  set  of  constraints  that  have  to  be  satisfied  in  all  solutions,  the  space  of  solutions, 
and,  a  set  of  preference  criteria  that  picks  out  one  or  more  solutions  from  among 
the  space  of  solutions  as  optimal  solutions.  The  language  of  preference  logic  is 
expressive  enough  to  enable  us  to  specify  all  three  components  in  a  succ!  ict 
fashion.  A  preferential  theory  is,  informally,  the  statement  of  an  optimization 
problem  in  the  language  of  preference  logic. 

Definition  6.  A  preference  clause  is  the  universal  closure  of  a  formula  of  the 
form  — *  P]{f\Lj)  where  Lj  and  Af*  are  general  literals,  i.e.,  literals 

(possibly)  adorned  with  □  and  O. 

Preference  clauses  are  sufficient  for  most  purposes.  The  treatment  of  preferential 
theories,  with  iterated  modalities  is  left  to  a  later  date. 


Definition?.  An  arbiter  \s  a  finite  collection  of  preference  clauses. 


369 


Definitions.  A  preferential  theory  is  the  conjunction  of  a  first-order  theory  T 
and  an  arbiter  A- 

Deflnition9.  A  preinterpretation  for  a  language  L  is  a  set  D  (the  domain  of 
interpretation  -  we  restrict  ourselves  to  the  single  sorted  case)  and  a  mapping 
from  terms  (including  variables)  to  D. 

Definition  10.  Given  a  preinterpretation  /',  an  /'-based  interpretation  /  is  /' 
together  with  a  mapping  from  (n-place)  predicate  symbols  to  subsets  of  O'* . 

Definition  11.  Given  a  preinterpretation  /',  an  /'-based  preferential  structure 
is  a  structure  M  =  (W,/?,  V)  such  that  V  assigns  /'-bcised  interpretations  to 
members  of  W. 

In  this  paper,  we  shall  be  interested  in  a  particular  class  of  preferential  theories 
called  Horn  preferential  theories.  Further,  the  only  preinterpretation  of  interest 
to  us  will  be  the  Herbrand  preinterpretation,  i.e.,  the  free  interpretation  of  terms, 

Definitionl2.  Let  {7ri|j  €  /}  be  a  finite  collection  of  definite  Horn  theories 
(standard  definitions  from  [12]).  Thus  each  Vi  is  a  conjunction  of  definite  Horn 
clauses.  Let  p  be  a  definite  Horn  theory  as  well.  A  modular  theory  Tm  is  defined 
to  be: 


}  >‘^3 

where  j  and  k  range  over  I. 

Intuitively,  p  corresponds  to  the  fixed  part  of  a  specification.  Thus  p  must  be 
satisfied  at  all  worlds  in  all  models  in  which  the  modular  theory  is  true.  The 
various  TTj  are  dispensable.  Let  us  call  them  the  transients  and  p  the  invariant. 
The  latter  conjunct  in  the  above  formula  for  a  modular  theory  imposes  the 
following  condition  on  the  models  of  a  modular  theory. 

Lemma  13.  If  M  is  a  preference  model  of  a  modular  theory  Tm  (I.e.,  Ad  \=Tm), 
then 

-  For  every  world  w  in  Ad,  and  for  every  transient  t,-  tn  T,  there  exists  some 
w-feasible  world  v,  such  that  Tj  and  rcj  for  all  j  ^  i. 

Definition  14.  A  solution  to  a  modular  theory  is  given  by  p  A  tt,  for  any  s  in  /. 

We  shall  assume  that  TTj  (f.  tCj  for  %  ^  j.  Also  for  any  i.  Of  course,  this  is 

quite  easy  to  arrange  by  introducing  new  dummy  predicates  that  are  unique  to 
the  respective  programs. 

Given  a  modular  theory  that  has  n  transients  (and  thus  n  solutions),  we 
would  like  to  be  able  to  specify  preference  criteria  that  determine  optimal  so¬ 
lutions.  A  set  of  preference  criteria  act  as  an  arbiter  in  determining  optimal 
solutions.  Let  us  denote  the  arbiter  by  A.  We  take  the  arbiter  to  be  a  finite  set 
of  preference  clauses. 


370 


Definition  15.  A  Horn  preferential  theory  is  of  the  form 

A  (/\  0(7rj  Tfc)))  A  A 

3  tjtj 

where  A  is  an  arbiter  and  the  first  part  is  a  modular  theory. 

Definition  16.  Let  /'  be  a  preinterpretation  and  let  Afi  and  Ad  2  be  two 
based  preferential  structures.  M\  <  Afj  iff 

1.  There  exists  (fi,  a  one-to-one  mapping  from  Wi  to  W2  such  that  Vu/  6 
WiV2(0(tn))  D  Vi(u<). 

2.  D  (7I2  restricted  to  4>(Wi  x  Wi))  (by  a  minor  abuse  of  notation). 

3.  If  in  the  above  condition,  equality  holds  then,  :<2  restricted  to  <^(Wi  x  VVi) 

Lemma  17.  LetTp  given  by 


Oi’Tj  A  -i(  y  srt)))  A  A 
3 

be  a  Horn  preferential  theory.  Let  H  be  the  Herbrand  preinterpretation  over 
the  language  of  the  preferential  theory.  There  ts  a  unique  H -based  <-minimal 
preferential  structure  Mh  such  that  Ain  N  ^p-  Adn  =  (W//, 72//,  V//) 

where 

-  W//  =  {ini ,  •  •  ■ ,  u;„},  where  n  is  the  cardinality  of  the  solution  space, 

~nH  =  WH3<  Wh, 

~  V//(Tni)  =  M(iax.  where  is  the  least  Herbrand  model  o/p  A  tt^. 

-  -<H  Is  the  smallest  subrelation  of  Tin  that  satisfies  the  arbiter. 

The  intended  preference  model  of  a  Horn  preferential  theory  is  its  least  preference 
model. 

5  Relaxable  Logic  Programs  as 
Horn  Preferential  Theories 

We  now  have  the  logical  machinery  to  complete  our  program  preference  transla¬ 
tion  that  we  began  in  an  earlier  section.  We  shall  use  preference  logics  to  specify 
the  fifth  component  of  our  translation  of  relaxable  logic  programs.  Consider 
the  program  preference  translation  of  the  first  clause  of  our  example  relaxable 
program  GP\.  We  shall  generalize  this  construction  to  a  set  of  such  relaxable 
clauses.  Let  GCi  denote 


(p{X),  {q,  r(X),  s(X)},  {s{X)  <  q,  s{X)  <  r(X),  r(X)  <  q}) 


371 


This  gives  rise  to  the  four  clauses 

Cl  :  piX)  -  q.  r(X),  s(A:).  C2  :  p(X)  -  q,  r(X). 

C3  :  p(A:)  -  9,  s(X).  C4  :  p(X)  -  q. 

Let  TTi  be  the  program  with  the  clause  C,.  What  is  left  to  be  specified  is  the 
arbiter  that  imposes  the  partial  order  among  these  clauses.  The  purpose  of  the 
arbiter  is  to  perform  preferential  maximization  in  the  context  of  the  truth  of  the 
predicate  p(X)  for  all  X.  The  arbiter  for  the  first  clause  of  GPi  is 
p(X)  A  r(X)  A  -sCA-)  -  Pf(piY)  A  r(y)  A  s(y)) 
p(X)  A  -r{X)  A  -^s(X)  -  Py(p(Y)  A  s(Y)) 
p(X)  A  -r(X)  -  Py(p(Y)  A  r(y)) 

The  arbiter  clauses  use  the  partial  order  on  the  body  of  the  relaxable  clause  to 
specify  which  p  worlds  are  better. 

Definition  18.  Let  GC  =  (H,  B,  PO)  be  a  relaxable  clause.  Let  B,  be  an 
element  of  B.  C(Bi,  GC)  is  the  constant  set  of  Bi  while  proving  H  in  the  context 
of  GC.  It  is  given  by  the  set:  {Bj  \  Bj  G  B,  Bj  nonmaximal  B,  <  Bj). 

The  constant  set  C{g,  GC)  for  a  goal  9  in  a  relaxable  clause  GC  gives  the  goals 
in  the  body  of  GC  that  are  preferred  over  g  in  establishing  H ,  the  head  of  6'C. 
Thus,  their  truth  has  to  be  maintained  while  stating  that  the  //-worlds  where  g 
is  true  are  better  than  //-worlds  where  g  is  not  true.  However,  we  do  not  care 
if  the  goals  in  the  constant  set  that  were  false  become  true  in  the  process  of 
moving  to  the  better  world.  This  motivates  the  division  of  C(g,  GC)  into  two 
complementary  subsets  Cgp  and  Cgn-  The  arbiter  clause  scheme  for  an  atom  g 
while  proving  H  in  the  context  of  GC  is  then  given  by: 

^  (A,.€C„9i)  ^(9;))  A  -igr  —  Pf(H'  A  (A,.ec„9i)  A  g') 

where  the  primed  literals  on  the  right  hand  side  represent  the  fact  that  variables 
on  the  left  hand  side  have  been  replaced  by  fresh  variables.  Thus,  there  are  no 
variables  in  common  between  the  left-hand  sides  and  the  right-hand  sides.  If  the 
constant  set  C{g,  GC)  for  a  goal  9  in  a  releixable  clause  GC  has  n  elements, 
then  there  are  2”  arbiter  clauses  corresponding  to  9  because  C(g,  GC)  has  2" 
subsets. 

We  generalize  the  above  program  preference  translation  for  a  single  relaxable 
clause  to  the  equivalent  translation  for  a  relaxable  logic  program,  i.e.,  a  finite 
set  of  relaxable  clauses.  Let  be  a  relaxable  logic  program  with  n  clauses 
GCi,  •  ■  ■ ,  GCn-  Let  the  translation  of  the  relaxable  clause  GC,-  give  the  clause 
set  Si  =  {Cii,  •  •  • ,  G,it,}  of  cardinality  ki,  and  the  arbiter  Ai.  The  preference 
logic  program  Xpre/  corresponding  to  GF  is  given  by 


P  A  (/\  0{wj  A  -.{  \/  jTi)))  A  A 
j 

where  p  is  empty,  A  is  the  set-theoretic  union  of  the  various  Ai  and  each  Xj 
is  obtained  by  picking  one  clause  from  each  of  the  n  clause  sets  given  by  the 


372 


Si  -  Thus,  the  number  of  competing  Vj  is  given  by  01=1  The  equivalent  prt> 
gram  preference  logic  program  for  our  original  example  has  1  <  j  <  4.  The 
corresponding  Wj  are  given  by 

n,  =  {piX)  ^  q,  r(X),  siX).  q.  s(2).  r(l).} 

Tf2  =  {p(X)  «-  q,  r{X).  q.  s(2).  r(l).} 

’ra  =  {p(X)  -  g,  s(X).  q.  5(2).  r(l).} 
w^^{p{X)^q.  q.  5(2).  r(l).} 

The  arbiter  A  is  the  arbiter  A  given  above,  since  the  arbiters  for  the  other 
clauses  are  empty.  We  add  distinct  dummy  predicates  to  the  Tj  so  that  the 
mutual-exclusiveness  of  the  Tj  is  satisfied.  The  following  is  true. 

Theorem  19.  Lei  np  be  the  Horn  preferenital  theory 


(A  ^  V  ^ 

j  k^i 

V  1  <  i  <  the  preference  modal  structure  that  is  the  least  model  of  iTp  ts 

(W,  H,  X,  V)  such  that 

1.  Cardinality  ofW  is  n, 

2.  V,  is  the  universal  relation, 

3.  There  exists  Wj  such  that  V{wi)  =  where  is  the  least  Her- 
brand  model  of  iCj  (over  the  language  of  Wp);  and 

4-  is  the  least  relation  on  W  induced  by  the  arbiter  A;  i.e.,  for  any  two  worlds 
wi  and  W2  Wi  ■<  W2  iff  there  exists  a  ground  instance  of  an  arbiter  clause 

Ai  Bi  -  P/(A;  Bj) 

such  that  Wi  ^  A;  Bi  and  W2  ^  Aj  Bj- 

At  this  point,  a  comparison  between  the  clause  preference  and  program  preference 
translations  is  in  order.  We  saw  earlier  that  in  the  clause  preference  translation, 
the  partial  order  on  the  bodies  of  relaixable  clauses  was  captured  intensionally  by 
the  textual  order  of  the  translated  clauses.  The  program  preference  translation 
enables  us  to  syntactically  characterize  these  preference  orders  via  the  arbiter. 
Further,  the  richer  modal  preference  models  reflect  these  preference  orders. 

Lemma  20.  Let  GP  be  a  relaxable  logic  program,  CP  its  clause  preference  trans¬ 
lation  and  PP  its  program  preference  translation.  Let  Mcp  be  the  least  Herbrand 
model  of  CP  and  Mpp  =  (W,  TZ,  ■<,  V)  be  the  least  modal  preference  model  of 
PP.  Then 


Mcp  = 

The  partial  order  among  the  worlds,  however,  gives  information  about  which 
solutions  are  preferred.  The  preference  order  motivates  the  following  definition 
of  an  interpreter  that  computes  the  most  preferred  solutions. 


373 


Definition21.  Let  GP  be  a  goal  preference  logic  program  and  g{x)  be  a  goal 
Let  W  be  the  set  of  worlds  in  the  legist  preference  model  of  the  program  preference 
t’  anslation  of  GP.  /  is  a  preferential  interpreter  if  /(GP,  is  the  set  of  all 
ground  substitutions  0,  such  that  there  exists  a  world  it;  €  VV  where  w  [=  (g(x))0, 
and  for  every  u;'  G  VV  with  w  -<  w'  and  w'  w,  either  w'  (=  (g(x))0  or  the 
extension  of  g  at  w'  is  empty.  Note  that  W  is  the  set  of  worlds  in  the  least 
preference  model  of  the  program  preference  translation  of  GP. 

Thus,  the  preferential  interpreter  would  return  X  =  1  as  the  preferred  solution 
for  the  goal  p(X)  in  our  example  program. 

6  Conclusions 

In  this  paper  we  motivated  and  presented  a  modal  logic  of  preference  and  ap¬ 
plied  it  to  the  problem  of  constraint  relaxation  in  logic  programming.  The  model- 
theoretic  analysis  of  relaxable  Horn  clauses  given  in  this  paper  has  been  extended 
to  include  a  fix-point  semantics  as  well  as  an  operational  semantics  based  on  a 
variant  of  standard  SLD-resolution.  The  details  can  be  found  in  [14].  The  appli¬ 
cations  of  preference  logic  to  knowledge  representation  and  symbolic  computing 
including  formal  planning,  abduction  and  diagnosis  are  tremendous.  Preference 
logics  represent  a  first  step  towards  bringing  methods  in  artificial  intelligence 
closer  to  classical  methods  in  decision  and  utility  theory.  Deep  connections  be¬ 
tween  preference  and  probability  exist  [11]  and  they  have  to  investigated  in  the 
context  of  our  framework.  Comparisons  with  meta-logical  approaches  to  control 
of  deductions  also  awaits  analysis.  We  are  confident  that  preference  logics  pro¬ 
vide  a  unifying  framework  for  the  various  philosophical  logics  used  in  artificial 
intelligence. 


References 

1.  Ackermann,  R.  Comments  on  n.  reshcer’s  semantic  foundation  for  the  logic  of 
preference.  In  The  Logic  of  Decision  and  Action.  1967. 

2.  Horning,  A.,  and  ET.  AL.,  M.  M.  Constraint  hierarchies  and  logic  programming. 
In  Sixth  International  Conference  on  Logic  Programming  (June  1989),  pp.  149-164. 

3.  Brown  Jr.,  A.  L.,  MaNTHA,  S.,  and  Wakayama,  T.  Preferences  as  normative 
knowledge:  Towards  declarative  obligations.  In  First  International  Workshop  on 
Deontic  Logic  in  Computer  Science  (Amsterdam,  The  Netherlands,  1991),  J.  J.  C. 
Meyer  and  R.  J.  Wietinga,  Eds.,  pp.  142-164. 

4.  Brown  Jr.,  A.  L.,  Mantha,  S.,  and  Wakayama,  T.  Preference  logics  and 
nonmonotonicity  in  logic  programming.  In  Logic  at  Tver,  International  Conference 
on  Logical  Foundations  of  Computer  Science  (Tver,  Russia,  1992),  A.  Nerode,  Ed., 
Springer- Verlag. 

5.  Brown  Jr.,  A.  L.,  Mantha,  S.,  and  Wakayama,  T.  Preference  logics:  Towards 
a  unified  approach  to  nonmonotonicity  in  deductive  reasoning.  In  Second  Inter¬ 
national  Symposium  on  Artificial  Intelligence  and  Mathematics  (Ft.  Lauderdale, 
Florida,  1992). 


374 


6.  Fishburn,  P.  Intransitive  indifference  in  preference  theory:  A  survey.  Operations 
Research  18  (1970). 

7.  Hallden,  S.  The  logic  of  better.  Lund,  1957. 

8.  Hansson,  B.  Fundamental  axioms  for  preference  relations.  Synthese  18  (1968). 

9.  Hansson,  S.  O.  A  new  semantical  approach  to  the  logic  of  preference.  Erkenntnis 
31  (1989),  1-42. 

10.  Hughes,  R.  I.  G.  Rationality  and  intransitive  preferences.  Analysis  40.3  (1980). 

11.  Jeffrey,  R.  C.  The  Logic  of  Decision.  University  of  Chicago  Press,  Chicago, 
1983. 

12.  Lloyd,  J.  Foundations  of  Logic  Programming.  Springer- Verlag,  New  York,  1984. 

13.  Luce,  R.  D.,  and  Raifa,  H.  Games  and  Decisions.  1957. 

14.  Mantha,  S.  First-order  preference  theories  and  their  applications.  Tech,  rep.. 
Dept,  of  Computet  Science,  University  of  Utah,  1992. 

15.  VON  Wright,  G.  H.  The  Logic  of  Preference.  University  of  Edinburgh  Press, 
Edinburgh  Scotland,  1963. 

16.  VON  Wright,  G.  H.  The  logic  of  preference  reconsidered.  Theory  and  Decision 
(1972),  55-67. 


A  Performance  Evaluation 
of  Backtrack-Bounded  Search  Methods 
for  N-ary  Constraint  Networks 


Pierre  Berlandier 

SECOIA  Project.  INRIA-CERMICS, 

2004,  Route  <ies  Luciolcs,  B.P.  93,  06902  Sophia- AntipoHs  Cedex 


Abstract.  An  early  study  on  the  structural  aspect  of  binary  constraint 
problems  has  led  to  the  definition  of  a  backtrack  bounded  solving  (BBS) 
method.  In  this  paper,  we  apply  this  method  to  n-ary  constraint  problems 
and  we  try  to  weigh  the  benefits  that  can  be  expected  from  s  use.  Simple 
functions  are  described  to  implement  BBS  for  acyclic  n-ary  problems  and 
results  of  an  experimental  performemce  evaluation  are  given. 


1  Introduction 


Graphs  are  the  most  natural  structures  to  represent  and  interpret  constraint 
problems.  Several  research  works  focused  on  this  structural  aspect  and  led  to 
the  elaboration  of  efficient  methods  for  solving  sparse  problems.  The  seminal 
ones  are  backtrack  free  [1]  and  backtrack  bounded  [2]  search  methods.  They  are 
regarded  as  attractive  techniques  by  review  papers  such  as  [3]  or  [4|.  However, 
they  have  seldom  been  implemented  for  experimental  evaluation  and  they  are 
most  often  ignored  in  the  design  of  operational  constraint  interpretation  sys¬ 
tems.  This  lack  of  interest  is  surprising  when  they  propose  to  solve  some  prob¬ 
lems  with  a  polynomial  time  complexity.  Also  noteworthy  is  the  fact  that  all 
the  structure  based  search  methods  (SBSM)  were  designed  and  remained  in  the 
restricted  framework  of  binary  constraint  problems. 

These  observations  prompted  us  to  seek  and  weigh  the  true  benefits  that 
can  be  expected  from  the  use  of  SBSM  for  the  satisfaction  of  n-ary  constraint 
problems.  This  paper  presents  our  experiments  and  conclusions  on  this  subject. 
We  first  review  the  existing  relationships  between  the  structure  of  the  problem, 
its  level  of  partial  consistency  and  the  complexity  of  finding  the  solutions,  this  in 
the  framework  of  binary  constraint  problems.  Then,  we  propose  to  map  the  ideas 
behind  these  relationships  to  n-ary  constraint  networks,  which,  in  fact,  represent 
the  majority  of  real  life  problems.  We  describe  simple  functions  to  implement  a 
backtrack  bounded  search  for  acyclic  problems.  Finally,  we  use  these  results  to  set 
an  experimental  comparison  between  the  most  common  solving  scheme  (based 
on  forward  checking  guided  by  the  dynamic  search  rearrangement  heuristics) 
and  a  solving  scheme  involving  a  SBSM. 


376 


2  Basics 

The  usutd  constraint  formalism  derives  from  (5|.  It  presents  a  constraint  prohleiu 
as  a  tuple  X  =  is  the  set  of  the  n  variables  of  the 

problem.  X>  is  a  set  of  finite  domains  and  D  is  a  bijection  from  X  to  T>  such 

that  D(xi)  is  the  domain  of  Xi.  C  —  {cj . is  a  .set  of  m  constraints  where 

Ci  is  a  tuple  (a  pair,  for  binary  problems)  of  variables  {xi . •J'ip)  from  X. 

Finally.  is  a  set  of  relations  and  i?  is  a  bijection  from  C  to  72  such  that  R(c,) 
is  the  relation  attached  to  c,  which  defines  the  set  of  p-tuples  allowed  by  this 
constraint. 

The  constraint  satisfaction  problem  (CSP)  con.sists  in  finding  one  or  more 
substitutions  V  of  all  the  variables  in  X  such  that: 

Vxi  €  X,  V(xi)  6  D(xi)  and  Vc.  €  C  {V(xi, ),...,  V(xiJ)  e  i?(c.) 

Finding  these  globally  consistent  .substitutions  is  NP-complete.  The  most 
common  way  to  reduce  the  complexity  is  to  apply  a  preliminary  filtering  step  to 
the  problem  in  older  to  enforce  its  partial  consistency. 

2.1  Partied  Consistency 

Enforcing  partial  consistency  consists  in  finding  and  excluding  sonie  partial  sub¬ 
stitutions  of  X  that  are  known  to  be  incompatible  with  the  constraint  set.  Incon¬ 
sistencies  can  be  detected  at  different  levels  by  considering  greater  and  greater 
subsets  of  the  problem's  variables. 

A  general  consistency  concept  for  binary  constraint  problems  is  fc-consisten- 
cy  [1].  A  problem  is  Tc-consistent  if,  given  k  —  1  variables  with  consistent  values, 
we  can  always  complete  this  set  with  any  other  fc-th  variable  of  the  problem  along 
with  a  value  in  its  domain  so  that  all  the  constraints  between  these  k  variables 
are  satisfied.  A  problem  is  said  to  be  strongly  fe-consistent  if  it  is  consistent  for 
every  j  <  k. 

2-consistency,  also  called  arc-consistency  (ac)  [6]  is  mostly  used  in  practice 
because  it  has  empirically  proven  to  offer  the  best  tradeoff  between  the  amount 
of  search  effort  and  problem  simplification.  Moreover,  inconsistent  substitutions 
being  singletons,  the  arc-consistency  process  does  only  shrink  the  domain  of  the 
variables;  it  does  not  generate  any  new  constraints. 

In  [2],  the  concept  of  fc-consistency  is  extended  to  (i,  j)-consistency  which 
means  that  given  i  variables  with  consistent  values,  we  can  find  values  for  j  other 
variables  such  that  the  constrmnts  between  the  t-l-jf  variables  are  satisfied.  Again, 
the  problem  is  strongly  (i,j)-consistent  if  it  is  (fc,  j)-consistent  for  every  k  <  i. 
As  for  2-consistency,  (1,  j)-consistency  can  be  achieved  simply  by  removing  some 
values  from  the  variables  domain. 

After  partial  consistency  is  enforced,  finding  global  solutions  to  the  eSP  still 
entails  a  traversal  of  the  search  space,  which  is  usually  conducted  by  a  backtrack 
search  procedure.  The  instanciation  order  of  the  variables  is  a  crucied  parameter 
of  the  search.  The  following  sections  are  concerned  with  finding  an  efficient  one. 


377 


2.2  Backtrack  Free  Search 

The  works  reported  in  [1]  and  then  in  |7]  show  that  the  eoniplexity  of  the  CSP 
is  strongly  related  with  the  structure  of  its  associated  graph.  As  expected,  the 
sparser  is  the  graph,  the  easier  is  the  problem.  The  results  of  these  works  have 
an  important  practical  interest  as  they  provide  a  mean  to  solve  some  problems  in 
a  reasonable  polynomial  time.  They  allow  to  determine  the  consistency  level  to 
install  together  with  the  sequence  of  variables  that  lead  to  .a  greedy  instanciation 
process. 

A  central  concept  of  constraint  graph  analysis  is  the  width  of  a  variable.  In  a 
given  sequence  of  variables,  it  is  dehned  us  the  number  of  variables  that  share  a 
constraint  with  the  variable  in  question  and  that  precede  it  in  the  sequence.  The 
width  of  a  sequence  is  the  maximum  width  of  its  variables  and  the  width  of  a 
constraint  graph  is  the  minimum  width  of  its  possible  sequences.  Figure  1  shows 
three  possible  sequences  for  the  binary  constraint  graph  {{xi,X2),{xi.X3))  and 
their  respective  width  (the  instanciation  proceeds  top-down). 


Fig.  1.  Widths 


Theorem  1  of  [1]  states  that,  when  the  level  of  strong  consistency  of  a  prob¬ 
lem  is  greater  than  the  width  of  its  graph,  there  exists  a  sequence  of  variables 
that  leads  to  backtrack  free  search.  This  result  appeals  to  the  intuition:  when  a 
problem  is  strongly  fc-consistent,  it  means  that  we  can  instanciate  up  to  k  vari¬ 
ables  of  the  problem  without  possible  failure.  Moreover,  when  a  variable  with 
width  w  is  instanciated,  we  have  to  check  the  constraints  it  shares  with  at  most 
w  preceding  variables,  which  is  similar  to  the  consistency  check  of  w  -)- 1  variable 
values.  Therefore,  if  we  have  >  w,  we  are  sure  that  there  is  always  a  consistent 
choice  for  any  variable  in  the  sequence. 


2.3  Directed  Consistency 

A  tree  is  a  graph  of  width  1.  Indeed,  if  we  assign  an  arbitrary  orientation  to  the 
vertices  and  apply  a  topological  sort  to  the  resulting  directed  acyclic  graph,  we 


378 


obtain  a  sequence  wiiero  oacli  variable  is  preeed*;d  by  at  most  one  of  its  adjacent 
variable,  that  is  a  sequence  of  width  1.  Solving  a  tree  structured  problem  without 
backtracking  thus  entails  strong  2-consistency. 

Nevertheless,  when  having  a  closer  look  at  it,  we  realize  that  full  arc-consist¬ 
ency  achieves  more  work  than  required.  In  fact,  it  is  sufficient  to  enforce  tlirected 
arc-consistency  (Dac)  following  the  order  of  instanciation  [7].  This  means  that 
for  two  variables  Ti  and  Xj  s\ich  that  Xi  precedes  xj,  it  is  sufficient  that  for  all 
value  Vi  €  D{xi)  there  exi.sts  a  compatible  value  Vj  6  D(xj).  The  converse  is  not 
necessary  as  the  value  of  Xj  is  always  chosen  after  that  of  v,  and  that  there  are 
no  implicit  constraints  between  Vi  and  vj.  Consider,  for  example,  the  leftmost 
sequence  of  Fig.  1  and  suppose  that  arcs  are  representing  equality  constraints 
and  that  we  have  D(xi)  =  {1,2.3},  D{x2)  =  {0,1,2}  and  Dix^)  =  {2,3,4}. 
Getting  directed  arc-consistency  costs  12  con.sistency  checks  and  only  reduces 
the  domain  of  Xi  to  {2}  whereas  full  arc-consistency  costs  18  consistency  checks 
and  reduces  all  domains  to  that  singleton.  In  the  general  case,  when  the  worst- 
case  complexity  of  AC  is  0{md^),  the  one  of  DAC  is  0{nd})  (where  d  is  the  size 
of  the  largest  domain). 

2.4  Backtrack  Bounded  Search 

In  [2],  the  width  definition  is  extended  to  j-width  which  is  the  minimum  of  the 
widths  of  groups  from  1  to  j  consecutive  variables.  As  for  the  regular  width,  the 
j'width  of  a  sequence  is  the  maximum  of  the  j-width  of  its  variables  and  then, 
the  j-width  of  the  graph  is  the  minimum  j-width  of  its  possible  sequences.  The 
characterization  of  backtrack  free  search  is  generalized  to  backtrack  bounded 
search;  when  a  graph  is  strongly  (*,y {-consistent  and  when  i  is  equal  to  the  j- 
width  of  the  graph,  there  exists  an  instanciation  sequence  so  that  backtracking 
is  limited  to  j  —  1  instanciated  variables. 

Now,  let  b  be  the  size  of  the  maximal  biconnected  component  of  a  constraint 
graph.  It  is  showed  in  [2]  that  the  (6—  l)-width  of  this  graph  is  1.  Therefore,  the 
corresponding  problem  can  be  solved  in  time  exponential  in  h.  after  its  ( 1 .  b  —  1  )- 
consistency  has  been  enforced.  This  probably  constitutes  the  most  useful  result 
of  the  backtrack  bounded  search  study  and  also  the  inspiration  source  of  our 
application  to  re-ary  constraints,  presented  below. 

3  N-ary  Constraint  Problems 

So  far,  we  have  learned  how  to  recognize  easy  binary  constraint  problems  and  how 
to  solve  them  efficiently  by  first  enforcing  a  certain  level  of  (directed)  consistency 
and  then  by  following  a  given  sequence  of  variables  during  instanciation. 

However,  real  life  problems  are  rarely  expressed  with  binary  constraints  in 
their  whole.  Of  course,  it  is  always  possible  to  translate  a  n-ary  constraint  prob¬ 
lem  into  its  binary  equivalent  [8],  but  this  translation  is  often  costly  and  results 
in  a  bigger  problem  which  may  be  less  tractable  than  the  original  one.  That 
is  why  we  wished  to  adapt  the  previous  theorems  on  SBSM  to  n-ary  constraint 
networks. 


379 


3.1  Principle 

N-ary  constraint  problems  are  the  straightforwanl  extension  of  binary  ones. 
Their  representation  is  also  a  generalization  of  graphs,  namely  hyi)ergraphs. 
Figure  2(a)  shows  the  repre.sentation  of  a  sample  ii  ary  constraint  problem. 


Fig.  2.  A  hypergraph  of  constraints. 


For  an  acyclic  constraint  hypergraph  C,  it  is  always  possible  to  compute  a 
sequence  of  constraints  S  such  that  any  given  constraint  does  share  at  most  one 
variable  with  all  the  constraints  that  precede  it  in  S.  Let  us  suppose  that  the 
following  proposition  holds: 

Vci  e  C,  €  Ci.  Vv  e  D{xij),  3(ri, . . .  .r^)  €  R(ci).rj  =  v  (1) 

Let  Cl  be  the  first  constraint  of  S.  The  previous  proposition  implies  that  a 
consistent  instanciation  of  its  variables  can  be  found.  Now.  let  us  suppose  that 
all  the  variables  of  the  constraints  preceding  a  given  Ci  of  S  are  consistently 
instanciated.  From  the  definition  of  S,  we  can  assert  that  at  most  one  variable 
of  Ci  is  instanciated.  Under  this  condition,  proposition  1  implies  that  we  can 
always  find  values  for  the  non-instanciated  variables  of  Ci  so  that  it  is  satisfied. 

This  shows  by  recursion  that  a  complete  consistent  instanciation  can  be  found 
without  backtracking  across  the  variables  of  distinct  constraints.  The  only  back¬ 
tracking  that  may  occur  is  during  the  search  for  a  consistent  instanciation  of  the 
variables  of  a  constraint.  So.  we  can  conclude  that  the  amount  of  backtracking 
is  restricted  to  [cj  —  1  variables  for  each  successive  constraint  c^.  The  worst 
case  complexity  of  the  whole  instanciation  process  is  thus  0{md“)  where  a  is 
the  maximal  arity  of  the  constraints  in  C. 

Moreover,  the  remark  on  directed  consistency  exposed  in  section  2.3  applies 
here  again.  The  level  of  partial  consistency  described  by  proposition  1  is  stronger 
than  what  is  actually  required.  It  is  sufficient  to  make  sure  that: 

Vci  €  S,Xi.  =  Ci  n  {cfc  €  5  I  cjt  Ci},V7»  €  D(xi.),3{ri . Xp)  €  R(ci).rj  =  v 


380 


This  Consistency  level  can  be  achieved  simply  by  filtering  each  constraint  cm<  e. 
following  the  reverse  order  of  sequence  S.  The  complexity  of  this  process  is 
also  0(md'^).  Therefore,  an  acyclic  n-ary  constraint  probhun  can  be  solved  with 
complexity  0(md‘‘]. 


3.2  Implementation 

In  the  general  case,  a  constraint  problem  is  not  entirely  arborescent.  It  is  often 
composed  of  small  treelike  subprobleins  that  are  encircling  one  bigger  cyclic 
subprobleni.  It  is  thus  recornmanded  to  combine  the  best  search  mechanism  we 
know  for  the  cyclic  part  with  a  SBSM  for  treelike  parts.  Moreover,  provided  that 
the  cyclic  part  is  likely  to  be  the  most  difficult  part  of  the  problem,  it  should  be 
solved  first  in  order  to  be  solved  only  once. 

The  constraint  tree  processing  we  propose  can  he  implemented  by  three  sim¬ 
ple  functions.  The  first  one  extracts  the  branches  of  the  problem  and  returns  a 
reverse  topological  sort  of  the  constraints  that  compo.se  those  branches.  It  pro¬ 
ceeds  by  stripping  the  leaves  one  by  one  from  the  graph,  a  leaf  being  a  constraint 
that  shares  at  most  one  attribute  with  the  other  constraints.  This  function  can 
be  defined  as  follows; 

function  strip(C) 
let  tree  =  (); 
while  C  ^  0 

do  let  leaves  =  (c  €  C  |  |c  fl  (J(C  \  {c})(  <  1}; 
if  leaves  =  {)  then  return  tree 
else  tree  <—  append( free, /eaves); 

C  «—  C  \  leaves-, 

end  strip 

Once  we  have  extracted  the  treelike  parts,  we  can  enforce  their  directed 
consistency.  As  we  said  before,  this  merely  consists  in  filtering  the  constraints  in 
the  order  returned  by  the  strip  function.  Finally,  and  also  from  the  sequence  of 
constraints,  we  can  build  the  sequence  of  variables  to  use  for  instanciation.  This 
is  the  role  of  the  following  function; 

function  sequence(/ree,C) 
let  sequence  =  (); 
while  tree  ^  () 
do  let  c  =  head(<ree); 

tree  «—  tail(  tree);  C  «—  C  \  {c}; 

for  all  X  €  c 

do  if  -'3c  ^  C,x  €  c 

then  sequence  <—  cons(x,  sequence) 
return  sequence-, 
end  sequence 


3S1 


In  the  hypergraph  C  of  Fig.  2(a).  the  cyelk  i>art  is  composed  of  cj  ami  03 
while  the  treelike  part  is  composed  of  cj.  C4  ami  C5.  Calling  strip(C)  returns  either 
(C4, cs, C3)  or  (C5, 04,03),  which  are  functionally  e<pavalent  for  our  purpose.  Then, 
calling  sequence((o5.04.C3).C)  may  return  the  se<|uence  {x4..f^.x^.TT.xs.-rg) 
which  conforms  to  the  tree  structure  and  can  he  use<i  as  a  stati<-  instanoiation 
order. 


4  Experimental  Evaluation 

The  goal  of  this  experimental  evaluation  is  to  weigh  the  benefits  that  can  he 
expected  from  the  use  of  the  tools  presented  in  the  previous  .section.  As  a  ref¬ 
erence  for  efficiency,  we  choose  the  best  general  .solving  -scheme  we  know  which 
combines  backtracking  with  forward  checking  ami  follows  a  dynamic  instancia- 
tion  order  (do)  termed  dynamic  search  rearrangement.  This  ordering  heuristics 
consists  in  selecting  as  the  next  variable  to  be  instantiated  the  variable  that  has 
the  smallest  domain  among  the  remaining  ones.  The  literature  [9,  10.  11]  usually 
agrees  that  it  gives  the  best  overall  performances  for  an  average  case  analysis, 
experimentally  as  well  as  analytically. 

Two  quantities  are  of  interest  when  benchmarking  a  constraint  solving  pro¬ 
cess:  the  number  of  constraint  checks  and  the  number  of  backtracks.  As  noted 
in  [12],  the  first  one  is  a  good  indicator  of  performance  while  the  second  one 
corresponds  to  the  size  of  the  exposed  .search  space.  What  we  present  in  our 
results  is,  for  those  two  quantities,  the  difference  between: 

1.  solving  a  problem  using  solely  DO  and  solving  it  using  a  static  tree  ordering 
TO  on  treelike  parts. 

2.  the  same  but  preceded  by  a  preprocessing  step:  AC  alone  in  the  first  ca.se 
and  AC  with  DAC  on  treelike  parts  in  the  secoml  one. 

A  positive  difference  is  a  point  in  favor  of  the  SBSM  whereas  a  negative  one  means 
it  is  inadequate. 

4.1  Method 

All  the  problems  we  are  testing  are  built  on  the  same  model  i.e.  a  central  cyclic 
problem  connected  to  several  treelike  parts.  The  chosen  central  problem  is  the 
well-known  Zebra  problem  which  has  only  one  solution. 

The  treelike  parts  are  varying  in  size  and  number.  They  are  made  of  1  to  5 
constraints  each  and  1  to  8  trees  are  connecte<l  to  the  cyclic  problem.  Provided 
that  the  Zebra  problem  is  composed  of  62  constraints,  the  treelike  parts  are 
then  representing  from  8  to  65  percent  of  the  total  number  of  constraints  in  the 
problem.  The  variables  of  the  treelike  extensions  are  given  the  same  domain  as 
the  ones  of  the  Zebra  problem. 

For  each  experiment,  the  .search  for  the  first  .solution  is  repeated  50  times 
with  a  different  initial  permutation  of  the  variables.  The  mea-sure  we  use  is  the 
average  of  the  results  we  obtain. 


382 


4.2  Results 

Figures  3,  4  and  5  display  two  graphics  each.  The  left  one  concerns  constraint 
checks  and  the  right  one  concerns  backtra<'ks.  In  each  graphic,  two  curves  are 
displayed.  The  black  one  represents  the  diti'erence  (ac,  do)  —  (DAC.To)  and  the 
grey  one  represents  the  differeiice  (do)  —  (to).  The  abscissa  represents  a  com¬ 
bination  of  the  size  and  number  of  treelike  parts.  It  starts  at  size  1  for  1  to  8 
trees,  then  size  2  also  for  1  to  8  trees,  etc.  up  to  size  5. 

—  Figure  3  shows  the  results  we  obtained  for  tr<-es  made  uj)  of  ternary  con¬ 
straints  with  a  high  fail  probability  (0.96).  They  are  positive  when  solving 
without  preprocessing  but  we  should  mention  that  the  standard  deviation 
(a)  of  the  measures  for  .solving  with  DO  is  very  high.  For  example,  with  5 
trees  of  size  3,  the  average  count  of  con.sistency  checks  is  8858  but  a  is  C54G. 
This  means  that  the  performance  depends  on  the  choice  of  the  first  instan- 
ciated  variable.  If  this  variable  belongs  to  a  treelike  part  and  <lue  to  the 
high  filtering  power  of  the  constraints,  the  whole  tree  will  be  solved  first 
and  the  solving  of  the  cyclic  part  will  be  repeated  several  times,  leading  to 
disastrous  results.  On  the  contrary,  if  the  first  variable  is  chosen  in  the  cyclic 
part,  the  latter  will  be  solved  first  resulting  in  a  good  performance.  So.  here, 
tree  ordering  avoids  dreadful  mistakes. 

On  the  other  hand,  when  preprocessing  is  applied,  the  results  are  not  favor¬ 
able  to  the  SBSM  any  more.  The  preprocessing  step  reduces  the  domains  so 
that  the  first  variable  can  be  chosen  wisely. 

—  Figure  4  shows  the  results  for  an  opposite  example.  Here,  trees  are  made 
up  of  ternary  constraints  with  a  low  fail  probability  (0.04).  These  results  are 
chaotic  he  it  with  or  without  preprocessing.  Therefore,  no  definite  conclusion 
can  be  drawn  and  by  default,  the  simplest  solution  should  be  preferred  (i.e. 
no  use  of  the  sbsm). 

—  Figure  5  shows  the  results  for  trees  composed  of  constraints  with  a  fail  prob¬ 
ability  individually  and  randomly  cho.sen  between  0.04  and  0.96.  Again,  the 
results  are  such  that  no  conclusion  can  be  drawn. 

Other  experiments  were  made  with  various  ranges  of  arity  and  satisfiability 
for  constraints,  domain  size  for  variables  or  branching  factor  for  trees.  They 
all  could  be  connected  to  the  previous  cases:  when  difficult  constraints  were 
dominating  in  the  treelike  parts,  the  results  were  near  to  those  of  Fig.  3. 

5  Conclusion 

We  showed  that  a  basic  SBSM  for  n-ary  constraint  problems  can  be  implemented 
ea.sily  and  at  low  cost.  Nevertheless,  from  the  experimental  results,  we  conclude 
that  benefits  can  only  be  expected  when  the  treelike  parts  of  the  problems  are 
made  of  constraints  with  a  high  filtering  power  and  that  no  preprocessing  is 
applied.  This  indeed  leaves  a  thin  gap  for  the  applicability  of  such  a  method. 


Fig.  5.  constraints  with  randoni  fail  probability 


384 


References 

1.  E).  Frc\icl«T.  A  siittiriciit  roinlitiou  for  lutcktriii  k  fri-c  s<-;irrh.  louiiml  nf  Ihi  ACM. 
29(  1):24  32.  1982. 

2.  E'.  Fit-nili'T.  A  siittii  irut  roiiilitioii  for  l>;i<  ktriu  k  houmli’il  si  an  ti.  lnuiual  «/  lln 
ACM.  32(4);755  761.  1985. 

3.  P.  Moschiht.  Coii.straint.  sHtisfactioii  prohlcius:  Aii  overview.  .A  ICO M .  2{  \  ):A  17. 
1989. 

4.  V.  Kumar.  Algorithms  for  eoii>.traiiit  satisfaet loii  prohlems;  .A  survey  AI  Maijii- 
ziiic.  13(  11:32  44.  1992. 

5.  U.  Moutaiiari.  Networks  of  eoiistraints;  Fim<lameiital  proju  rt ies  ami  applieat ir..i 
to  picture  processing,  lufoniiahoii  Snitui.  7(.3):95  132.  1974. 

6.  A.  Mackworth.  Consistency  in  networks  of  relations.  .Arfi/ir  till  Iiiti  lluj'  m  i  .  S-.99 

118.  1977. 

7.  R.  necht<T  anil  .1.  Pearl.  Network  l>a.se(i  heuristi(s  for  eunstraiiit  salisfaetieii 
problems.  Arliyictu/ /ntc/hycncc.  34:1  38.1988. 

8.  F.  Rossi.  C.  Petrie,  and  V.  Dhar.  On  the  e<piivalence  of  constraint  salisfai  tiou 
problems.  Technical  Report  ACT-Al-222-89.  MCC.  1989. 

9.  P.  Ptirdom.  Search  rearrangement  backtracking  and  polynomial  average  time.  .4t- 
ft/icial  InU-Uiiienre.  21:117  133.  1983. 

10.  B.  Nudel.  Consistent  labeling  problems  ami  their  algorithms.  Artifinul  hifilli- 
ycticc.  21:135  178.1983. 

11.  .1.  Gaschnig.  Perfortnanti-  Mcaxuriiiiivt  aiij  .Avaly.n.^  of  Ctrtiixu  Smith  .Ah/o- 
rithin.i.  PhD  thesis.  Carnegie  Mellon  University.  1979. 

12.  R.  Dechter  ami  I.  Meiri.  Experimental  evaluation  of  pre]>roces.sing  technitpies  in 
constraint  satisfaction  problems.  In  Pror.  IJCAl.  Detroit.  .Vli<  higan.  1989. 


Finite  Domain  Consistency  Techniques: 
Their  Combination  and  Application  in 
Computer-Aided  Process  Planning 


Manfrod  A.  Mt'yer'  and  Jdrg  P.  Miiller" 


'  (li-rman  Researcli  Center  for  ArtificiaJ  Inlelligence  (DFKl), 
Krwin-Schrodingcr-SlraBe,  6750  Kaiserslautern,  Germany 
^  German  Researeli  Center  (or  Artificial  Intelligence  (OKKl), 
Stuhlsatzenhausweg  3,  6600  Saarbrucken  II,  Germany 


Abstract.  In  this  paper  we  present  the  weak  looking-ahead  strategy 
(VVl.A),  a  consistency  technique  on  finite  domains  combining  the  compu¬ 
tational  efficiency  of  forward-checking  with  the  pruning  power  of  looking- 
ahead.  We  show  that  by  integrating  weak  looking-ahead  into  PROLOG’S 
SLD  resolution  we  obtain  a  sound  and  complete  inference  rule,  whereas 
standard  looking-ahead  it.solf  has  been  shown  to  be  incomplete.  We  out¬ 
line  how  we  use  the  weak  looking-ahead  technupie  for  lathe  tool  selection 
in  a  CIM  environment. 


1  Introduction 

Constraint  Logic  Programming  has  been  shown  to  be  a  very  useful  too*  for 
knowledge  representation  and  problem-solving  in  different  areas.  Finite  domain 
extensions  of  PROLOC  together  with  efficient  consistency  techniques,  such  as 
forward-checking  and  looking- ahead,  allow  us  to  solve  many  discrete  combina¬ 
torial  problems  efficiently  by  restricting  the  search  space  in  an  a  priori  nianner. 
When  using  these  consistency  techniques  to  implement  real-world  applications,  it 
turned  out  that  forward-checking  and  looking-ahead,  as  provided  in  most  finite- 
domain  PROLOG  extensions,  are  not  totally  satisfactory.  Forward-checking  of¬ 
ten  does  not  give  any  pruning  at  all  until  variables  become  singletons,  whereas 
standard  looking-ahead  forces  strong  pruning  of  the  search-space  but  induces 
serious  control  overhead. 

In  this  paper  we  present  a  consistency  technique  on  finite  domains,  which 
combines  the  efficiency  of  forward-checking  and  the  pruning  power  of  standard 
looking-ahead:  The  basic  idea  of  this  weak  looking-ahead  (WLA)  strategy 
is  to  apply  looking-ahead  only  once  to  a  constraint  and  to  use  forward-checking 
for  further  restricting  the  domains  of  its  arguments.  The  way  weak  looking- 
ahead  came  into  being  stood  in  a  close  relation  to  a  real-life  application;  In  the 
ARC-TF.C  project  at  DFKI  we  have  been  developing  a  knowledge-based  system 
generating  workplans  for  lathe  CNC  machines.  (/iCAD2NC,  [2]).  After  we  de¬ 
cided  to  solve  the  subproblem  of  selecting  appropriate  lathe  tools  for  the  various 
processing  steps  by  using  constraints,  we  experimented  with  several  consistency 
algorithms.  Soon  we  realized  that  on  the  one  hand,  forward-checking  is  too  weak 


386 


for  some  applications  where  an  earlier  pruning  of  the  search  space  is  desired.  On 
the  other  hand,  using  looking-ahead  for  this  application  is  a  bit  like  breaking 
the  butterfly  on  the  wheel.  These  observations  entailed  the  wish  for  a  new  tech¬ 
nique  which  causes  only  a  little  more  cost  than  forward-checking,  but  which  can 
achieve  much  better  pruning  results  in  many  cases  (see  the  example  in  section 

4). 

2  Finite  Domain  Consistency  Techniques 

Over  the  past  years,  increasing  attention  has  been  paid  to  using  constraints  in 
logic  programming  [4,  11]  for  it  presents  a  very  powerful  incorporation  of  the 
advantages  both  of  logic  programming  (declarativity,  relational  form,  nonde¬ 
terminism)  and  consistency  techniques  for  constraint  solving  problems  [7].  The 
use  of  consistency  techniques  allows  one  to  overcome  the  basic  shortcomings  of 
logic  programming  languages,  which  are  mainly  caused  by  their  poor,  mostly 
backtracking-like  control  strategies.  Techniques  such  as  forward-checking  and 
looking-ahead  are  used  to  restrict  the  domains  of  variables  in  an  active  manner 
and  to  achieve  an  a  priori  pruning  of  the  search  space. 

In  this  section  we  will  give  a  short  informal  description  of  both  forward¬ 
checking  and  looking-ahead  according  to  [11].  In  section  3  we  present  the  weak 
looking-ahead  method  which  is  essentially  based  on  these  techniques. 

Forward-Checking:  The  idea  of  forward-checking  is  formally  expressed  by  the 
forward-checking  inference  rule  (FCIR).  Informally,  a  constraint  C  can  be  used 
in  a  forward-checking  manner  as  soon  eis  all  except  one  of  its  domain-variable 
arguments,  say  X,  are  instaiitiated  to  a  ground  value.  Then,  C  is  called  forward- 
checkable.  C  can  be  considered  a  unary  predicate  C'{X),  and  the  set  of  possible 
values  that  can  be  given  to  X  can  be  restricted  to  those  elements  a  satisfying 
C'(a). 

Forward-checking  has  turned  out  to  be  one  of  the  most  popular  consistency 
techniques,  since  it  can  be  easily  implemented  (cf.  [10]),  it  yields  reasonable 
pruning  results  for  many  applications,  keeping  the  computational  costs  fairly 
low,  and  since  there  exist  sound  and  complete  proof  procedures  based  on  normal 
SLD  resolution  combined  with  forward-checking  (see  [11]). 

The  main  drawback  of  forward-checking  is  its  strong  applicability  precon¬ 
dition:  A  predicate  can  be  executed  by  the  FCIR  only  if  all  except  one  of  its 
variable'  are  instantiated  to  a  ground  value.  Thus,  for  predicates  with  many  ar¬ 
guments  and/or  many  variables,  at  a  given  point  of  computation,  there  is  only  a 
relatively  small  probability  that  forward-checking  can  be  applied  to  it.  Moreover, 
especially  when  computation  starts,  it  is  very  often  the  case  that  no  constraint 
is  forward-checkable.  That  means  that  choices  have  to  be  made,  i.e.  variables  are 
instantiated  in  a  more  or  less  random  manner.  Thus,  the  devil  of  backtracking 
which  we  would  like  to  exorcize  by  the  use  of  consistency  techniques,  returns 
through  the  back  door.  Finally,  some  constraints,  such  as  =,  >,  and  <  should 


387 


not  be  executed  by  forward-checking  at  all,  because  they  embody  a  great  deal 
of  structural  information  about  the  relation  between  their  arguments^. 

Looking- Ahead:  The  looking-ahead  strategy  [6]  offers  a  powerful  possibility 
to  reduce  the  number  of  values  that  can  be  assigned  to  variables  of  a  constraint, 
even  if  this  constraint  is  not  yet  forward-checkable. 

For  every  domain  variable  X  appearing  as  an  argument  of  an  N-Aiy  con¬ 
straint  C,  and  for  every  value  within  the  domain  of  X,  it  must  be  checked  whether 
there  exists  at  least  one  admissible  value  from  the  domain  of  each  domain  vari¬ 
able  y  appearing  in  C  so  that  the  constraint  C  is  satisfied.  The  arguments  of  C 
which  are  not  domain  variables  must  be  ground. 

By  using  the  Looking-Ahead  Inference  Rule  (LAIR),  the  search  space  can  be 
pruned  at  an  early  stage  of  computation.  However,  the  trouble  with  standard 
looking-ahead  is  that  it  is  a  very  expensive  method  of  ensuring  arc-consistency. 
Therefore,  for  most  applications  it  is  considered  inappropriate  [3].  Nevertheless, 
it  would  be  a  shame  to  forgo  all  the  benefits  brought  about  by  the  strong  pruning 
capabilities  of  looking-ahead. 

In  the  next  section,  we  present  weak  looking-ahead,  which  can  be  regarded  as 
a  compromise  between  forward-checking  and  looking-ahead.  Assume  e.g.  that,  in 
our  looking-ahead  example  above,  we  would  perform  the  first  looking-ahead  step 
as  shown,  but  after  that,  we  would  not  do  any  more  looking-ahead,  but  instead 
solve  the  (now  simplified)  problem  by  normal  resolution  or  by  forward-checking. 
This  procedure  expresses  the  main  idea  of  the  weak  looking-ahead  strategy  which 
we  will  discuss  in  more  detail  in  the  following. 


3  The  Theoretical  Background  of  WLA 

The  basic  theoretical  work  in  the  area  of  using  consistency  techniques  in  logic 
programming  has  been  done  by  van  Hentenryck  [11].  This  research  has  con¬ 
tributed  a  great  deal  to  both  forming  a  solid  framework  and  preserving  the  logic 
part  of  the  programming  languages  developed  while  achieving  a  much  better 
control  behaviour  than  that  achieved  by  standard  logic  programming  languages 
such  as  PROLOG  (cf.  [5]). 

In  this  section,  we  will  give  a  formal  definition  of  the  weak  looking-ahead 
inference  rule,  and  we  will  present  the  basic  formal  properties  such  as  soundness 
and  completeness  of  the  proof  procedure  defined  on  top  of  WLA.  The  terminol¬ 
ogy  we  use  and  the  sense  we  use  it  are  basically  the  same  as  in  [11]. 

The  weak  looking-ahead  strategy  combines  the  use  of  LAIR  and  FCIR.  A 
similar  technique  has  been  informally  proposed  in  [3]  as  ’’first-order  looking- 
ahead”.  We  present  a  generalized  technique  we  call  weak  looking-ahead.  This 
name  seems  more  appropriate  for  expressing  what  the  underlying  algorithm  re¬ 
ally  does.  The  basic  idea  of  WLA  is  that  each  constraint  can  be  selected  by 

^  For  example,  the  information  that  two  variables  X  and  Y  are  equal  should  not  only 
be  used  if  X  or  V  are  ground.  Rather,  the  equality  constraint  should  be  maintained 
from  the  moment  it  has  been  stated  (see  [10]). 


388 


the  looking-ahead  part  not  more  than  once,  and  that  this  should  happen  at  an 
appropriate  time.  After  this,  only  the  FCIR  (or  normal  inference)  can  be  applied 
to  it.  This  idea  is  covered  by  the  following  definitions. 

Definition!  (WLA-checkable).  An  atom  p(/i, f„)  is  called  WLA-checkable 
if  p  is  a  constraint  and 

•  p(fi, . .  .,tn)  is  lookahead-checkable  and  has  not  yet  been  selected  by  WLA, 
or 

•  p(ti, . .  .,<n)  is  forward-checkable  and  has  already  been  selected  by  WLA. 

Definition2  (WLA).  Be  P  a  program,  Gi  =  ?-  Ai, . . Ak,  ■  ■  ■,  Am  a  goal 
and  <r,+  i  a  substitution.  Gi+i  is  derived  by  WLA  along  with  from  Gi  and 
P  if  the  following  holds: 

1.  Ai  is  WLA-checkable,  with  ij . x„  being  WLA  variables  in  At- 

2.  If  Ale  is  lookahead-checkable  and  WLA  has  not  been  applied  to  A*  in  the 
actual  proof,  then  goto  3;  otherwise  goto  7. 

3.  For  each  xj,  the  new  domain  ej  is 

Cj  =  {vj€dj  I  3ui€di, . . Vj_i€dj-.i,Vj+i€d;+i, . . .,  t;„€d„  such  that 
<T(Ak)  is  a  logical  consequence  of  P,  <r  =  {xi«— m , . . .,  x„<—v„})  ^  0. 

4.  j/j  is  the  constant  c  if  ej  =  {c},  or  a  new  variable  which  ranges  over  ej, 
otherwise. 

^i+ 1  “  {^1  ^  yi  }  •  '  -t  Xlfi  *  Vn  }  ' 

6.  Gi+i  is  either  ?—  <7j+i(Ai , . . . ,  Afc_i,  At+i, . . . ,  Am)  if  at  most  one  yj  is  a 
domain  variable,  or  ?-  <7,+i(Ai, . . . ,  Am),  otherwise.  EXIT. 

7.  Ale  is  forward-checkable,  let  xj  be  the  forward  variable  inside  A*.  Then  the 
new  domain  e  is  defined  ase  =  {a€d|P^  Atfzrf  a}}  ^  0. 

is  defined  as  CTi+i  =  {x^  <—  c},  if  e  =  {c},  where  yg  is  a  new  domain 
variable,  otherwise.  Gj+i  =  ?-  <7-i4.i(Ai, . . .,  Ai_i,  At+i, . . . ,  Am)- 

The  main  point  of  the  above  definition  is  point  6,  which  uses  SLDFC-resolution 
(SLD-resolution  extended  by  the  FCIR),  whose  soundness  and  completeness  have 
been  proven,  in  order  to  finish  the  proof  after  some  prepruning  has  been  done 
by  using  the  LAIR  in  a  definite  way.  Thus,  if  we  want  to  prove  soundness  and 
completeness  of  WLA,  we  basically  have  to  check  the  LAIR  part.  Since  this  part 
gets  involved  not  more  than  once  for  each  goal,  and  since  this  happens  as  early 
as  possible  (due  to  point  2  of  the  definition),  the  disadvantages  of  the  LAIR  such 
cis  its  incompleteness  and  the  high  computational  overhead  can  be  avoided. 

Propositions  (Soundness  of  WLA).  Let  P  be  a  program  and  Gi  be  the  goal 
?-  Ai, ...,  At, ...,  Am  with  Ak  WLA-checkable.  Lei  the  goal  G,+i  be  derived  by 
WLA  along  with  (Ti+i  from  Gi,P  as  Gi+i  =  ?- cr,+i(Ai , ...,  At_i,  At+i , ...,  Am). 
Gi  IS  0  logical  consequence  of  P  iff  Gi+i  is  a  logical  consequence  of  P. 

The  next  result  concerns  the  completeness  of  WLA.  This  means  that  we  can 
define  a  complete  proof  procedure  using  weak  looking-ahead. 


389 


Definition4  (SLDW-resolutioii).  A  first-order  resolution  proof  procedure 
is  called  SLDW-resolution,  if  it  uses  weak  looking-ahead  for  WLA-checkable 
goals  and  normal  SLDD-derivation  (extended  SLD-resolution  based  on  domain- 
variable  unification)  for  other  goals. 

We  can  prove  the  completeness  of  such  a  proof  procedure  by  making  use  of 
the  completeness  of  the  FCIR,  showing  that  by  applying  the  LAIR  not  more  than 
once  to  each  goal,  no  solutions  are  lost.  The  completeness  result  is  expressed  by 
the  following  proposition. 

Propositions  (Completeness  of  WLA).  P  be  a  logic  program,  G  be  a  goal. 
If  there  exists  an  SLDD-refutation  of  PU{G},  then  there  also  exists  an  SLDW- 
refutation  o/PU{G}.  Moreover,  if  a  is  the  answer  substitution  from  the  SLDD- 
refutation  of  PU{G},  and  p  «s  the  answer  substitution  from  the  SLDW-refutation 
of  Pu{G},  then  p  <  ff. 

For  the  proofs  of  propositions  3  and  5  we  refer  to  [10]. 

4  Using  Weak  Looking-Ahead  for  Tool  Selection 

In  this  section,  we  will  show  the  usability  of  the  weak  looking-ahead  inference  rule 
by  an  example  from  the  concrete  domain  of  the  ARC-TEC  project  [1]  at  DFKl 
which  constitutes  an  AI  approach  to  implement  the  idea  of  computer-integrated 
manufacturing  (CIM).  For  its  evaluation,  an  expert  system  for  production  plan¬ 
ning  has  been  developed. 

4.1  The  Lathe- Tool  Selection  Problem 

The  application  problem  we  are  dealing  with  for  the  rest  of  this  paper  will  be  to 
find  appropriate  lathe  tools  to  manufacture  a  given  workpiece.  Depending  on  the 
shape,  the  material  and  other  attributes  of  the  lathe  part  to  be  manufactured, 
the  work-plan  consists  of  a  number  of  different  steps.  A  typical  work-plan  may 
provide  one  step  for  roughing,  another  step  for  finishing  and  a  third  (facultative) 
step  for  doing  the  fine  finishing  of  the  lathe  part.  For  each  processing  step, 
appropriate  tools  have  to  be  chosen. 

This  tool  selection  heavily  depends  on  a  lot  of  geometrical  (e.g.  the  edge- 
angle)  as  well  as  technological  parameters  (e.g.  material,  process  etc.).  Moreover, 
the  tool  system  itself  consists  of  subparts  which  have  to  be  combined,  e.g.  the 
tool  holder,  the  material  of  the  plate  and  its  geometry.  In  practice,  there  are  a 
lot  of  restrictions,  e.g.  which  holder  to  use  for  which  plate,  or  which  kind  of  plate 
geometry  to  use  for  which  workpiece  contour. 

To  keep  things  simple,  we  cissume  that  a  lathe  tool  consists  of  two  basic 
parts:  the  cutting  plate,  which  actually  cuts  the  material,  and  the  tool  holder, 
which  serves  to  hold  the  cutting  plates.  In  our  application,  we  are  now  concerned 
with  finding  a  well-suited  tool — or  rather:  a  number  of  well-suited  tools- — starting 
from  a  set  of  constraints  which  describe  the  current  problem.  Lathe-tool  selection 


390 


then  results  in  a  set  of  possible  holder/tooi  combinations  for  each  skeletal  plan 
or  manufacturing  feature. 

When  formalizing  the  tool  selection  problem  as  a  CSP,  the  first  thing  we  have 
to  do  is  to  restrict  the  number  of  input  parameters.  For  our  small  example  we  will 
use  the  variables  Holder  and  Plate,  denoting  the  tool  holder  and  the  cutting 
plate,  Process  and  WP-material,  denoting  the  actual  kind  of  processing  and 
the  material  of  the  lathe  workpiece,  respectively.  Furthermore,  we  will  use  three 
variables  denoting  different  angles,  namely  Beta-max,  Edge- Angle,  and  the 
tool-cutting  edge-angle  TC-Edge- Angle.  These  angles  are  denoted  by  /?,  e,  and 
X,  respectively,  in  figure  1. 


cutting  direction 


Having  identified  the  problem  variables,  the  constraints  can  be  put  on  the 
variables.  In  the  following,  we  will  consider  only  the  most  important  constraints, 
i.e.  holder_tcea( Holder,  TC-Edge-Angle)  to  describe  the  functional  rela¬ 
tion  between  a  holder  and  its  tool-cutting  edge-angle;  plate-ea(Plate,  Edge- 
Angle)  to  denote  that  each  plate  has  its  own  edge-angle;  compatible( Holder, 
Plate)  to  express  the  compatibility  condition  between  tool  holders  and  cut¬ 
ting  plates;  hard-enough(Plate,  WP-Material)  to  describe  that  for  materi¬ 
als  with  different  degrees  of  hardness,  different  cutting-plates  have  to  be  used; 
process_holder(Process,  Holder)  to  denote  appropriate  types  of  holders  for 
the  different  steps  of  processing;  process_edge-angle(Process,  Edge- Angle) 
to  express  the  rule  of  thumb  that  for  roughing,  plates  with  big  edge-angles  should 
be  chosen,  whereas  for  finishing,  smaller  edge-angles  are  appropriate;  finally,  TC- 
Edge-Angle  +  Edge- Angle  -f  Beta-Max  shall  be  less  than  180®.  This  con¬ 
straint  becomes  evident  when  looking  at  figure  1,  where  the  angles  are  denoted 
by  respectively. 


4.2  A  FiDo  Program  for  Tool  Selection 

In  this  subsection,  we  propose  a  formalization  and  solution  of  the  tool-selection 
problem  using  FiDo,  a  Finite  DOmain  PROLOG  extension  developed  at  DFKI 
(see  [9]  for  a  survey).  We  will  process  the  constrain.  ^  defined  above  by  using  weak 
looking-ahead  (WLA)  and  forward-checking  consistency  techniques.  We  will  give 


391 


a  trace  of  the  constraint  propagation  process  and  show  the  advantages  of  WLA 
compared  to  standard  looking-ahead  or  forward-checking  techniques. 


tool j&el( Holder,  Plate,  Proce««,  Material,  Beta-Max.  Edge-Angle,  TC-Eidge-Angle)  > 
definition  of  the  domains  and  domain  variables  *\ 
define.domain(holders,  [Holder],  (tmaxp-ptll,  tmaxp^ptlS,  Imaxp-ptlS]), 
define-domain( plates,  [Plate],  [dnmm-71,  vnmm-71,  cnmm-71,  tnmm-71,  dnmm'41, 
vnmm-4l,  cnmm-41,  tnmm-4l]), 

define.domain(proceases,  [Process),  [roughing,  finishing,  fine-finishing]), 
define.domain(materials,  [Material],  [steel,  cast,  aiu]), 
define_domain(angles,  [Beta-Max,  Edge-Angle,  TC-Edge-Angle],  0..90), 

X*  dynamic  building  of  the  constraint  net 
w|a(holder-tcea(Holder,  TC-Edge-Angle)),  \*  (1)  *\ 
wla(plate^a(Plate,  Edge-Angle)),  \*  (2)  •X 

wJa(TC-Edge-Angle  +  Edge-Angle  +  Beta-Max  <  180),  \*  (3)  •X 
forwai^(compatible(Holder,  Plate)),  X*  (4)  •X 
forward(hard_enough(Plate,  Material)),  \*  (5)  *\ 
forward(proces8-hold€r(Process,  Holder)),  \*  (6) 
forward(process.edge.angle(Process,  Edge-Angle),  X*  (7)  *X 
X'^  instantiate  variables  using  a  first-fail  instantiation  predicate  *X 

instantiate.dl([Holder,  Plate,  Process,  Material,  Beta-Max,  Edge-Angle,  TC-Edge-Angle]) 


Fig.  2.  A  FiDo  Program  for  Tool  Selection 


X*  definitions  of  the  constraints;  *\ 

\*  (5)  n 

hard.enough(cnmm-71,  cast). 

hard_enough(tnmm-71,  cast). 

holder-tcea(tmaxp.ptll,  80) 

hard.enough(dnmm-71,  cast) 

holder-tcea(tmaxp-ptJ2,  60). 

hard-enough( vnmm-71,  cast) 

holder-tcea(tmaxp-ptl3,  45). 

hard-enough(cnmm-41,  cast) 

hard-enough(tnmm-7l,  cast). 

\*  (2)  *\ 

hard-enough(dnmm-7I ,  cast) 

plate-ea(cnmm-7l ,  80). 

plate-ea(tnmm-71 ,  60) 

\*  (6)  *\ 

piate-ea(dnmm-71 ,  55). 

proce8S-holder( roughing,  tmaxp-ptll). 

plate-ea(vnmm-71 ,  35). 

proces8_holder( roughing,  tmaxp-ptl2). 

proc€S8_hold€r( roughing,  tmaxp-ptl3). 

\*  (4)  n 

compatible(tmaxp-ptll ,  cnmm-71). 

\*  (7)  ‘N 

compatible(tmaxp-ptIl,  tnmm-71). 

proces8-edge,angle( roughing,  Edge-Angle) 

compatible(tmaxp-ptll ,  dnmm-71). 

forward(Edge-Angle  <  65), 

compatible(tmaxp-ptll ,  vnmm-71). 

f 

Fig.  3.  The  Tool  Selection  Database 


In  the  following  we  will  show  bo  v  the  program  behaves  in  a  concrete  example. 
For  the  purpose  of  referencing  the  co.iatiaints,  we  numbered  them  consecutively 
from  1  to  7  in  figure  3.  Assume  that  the  following  call  is  performed: 

?-  tool_sel(Holder ,  Plate,  roughing,  cast,  70,  TC-EA,  EA). 

This  call  to  the  tool-selection  program  provides  the  following  input  data:  It  binds 
the  variables  Process,  Material,  and  Beta-Max  to  roughing,  cast,  and  70®, 
respectively.  The  variables  Holder,  Plate,  TC-EA  and  EA  are  supposed  to  be  the 
output  variables  for  this  call.  Now  let  us  see  how  the  call  is  processed: 


392 


1.  First,  the  constraint  holder.tcea  (constraint  number  1)  is  executed  in  a 
weak  looking-ahead  manner.  Whereas  the  holder  domain  is  not  changed  at 
all,  the  domain  of  the  variable  TC-Edge-Angle  is  restricted  to  the  value 
{45,  60,  80}.  Now,  the  constraint  is  reformulated  as  a  forward-checking 
constraint  and  is  suspended,  since  it  is  actually  not  forward-checkable. 

2.  The  same  as  described  for  constraint  #1  happens  to  constraint  #2;  whereas 
the  variable  Plate  is  left  unchanged,  the  domain  of  the  variable  Edge-Angle 
is  restricted  to  {35,  56,  60,  80},  and  the  forward-checking  version  of  the 
constraint  is  suspended  until  later. 

3.  Now,  constraint  #3  is  about  to  be  executed.  This  is  the  constraint  which 
actually  benefits  most  from  the  WLA  control.  By  performing  the  lookahead 
part  of  the  weak  looking-ahead  algorithm,  the  domains  of  the  variables 
TC-Edge-Angle  and  Edge-Angle  are  restricted  to  {45,  60}  and  {36},  re¬ 
spectively.  The  Edge-Angle  variable  becomes  instantiated.  Since  constraint 
#3  is  suspended  as  forHard(TC-Edge-Angle  <  75),  the  instantiation  of 
Edge-Angle  causes  constraints  #2  and  #7  to  fire. 

4.  Constraint  #2  is  woken  up  and  restricts  the  Plate  variable  to  the  values 
{vninin-71,  viunin-4l}.  Then,  constraint  #7,  both  of  whose  variables  are  in¬ 
stantiated,  is  checked  successfully.  Thus,  constraints  and  #7  are  done. 

5.  Constraint  #5  is  forward-checkable,  since  the  workpiece  material  has  been 
determined  by  the  call  to  the  main  goal  tool_sel.  Since  vniiun-41  is  not 
suitable  for  processing  cast  iron,  the  Plate  domain  becomes  restricted  to  the 
set  {vnnun-7l}.  Thus,  Plate  is  instantiated  to  its  singleton  value  viuiun-71. 

6.  Due  to  the  instantiation  of  Plate,  constraint  #4  becomes  forward-checkable, 
it  is  woken  up  and  restricts  the  Holder  domain  to  the  values  {tmaxp-ptll , 
tinaxp-ptl2}. 

7.  Now,  constraint  #6  is  checked.  Because  of  the  initialization  of  the  variable 
Process  to  {roughing},  the  constraint  is  forward-checkable.  However,  since 
both  holders  are  appropriate  for  roughing,  there  is  no  further  restriction  of 
the  Holder  domain. 

At  this  intermediate  stage,  the  values  of  the  variables  are  as  follows: 

Holder  =  {tmaxp-ptll,  tmaxp-ptl2},  Plate  =  viunm-71.  Material  =  cast. 
Process  =  roughing,  Edge-Angle  =  35,  and  TC-Edge-Angle  =  {45,  60}. 

Then,  the  instantiation  predicate  insteoitiatejll  (instantiation  with  first- 
fail  heuristics  using  the  domain  lengths)  is  called. 

8.  The  variable  Holder  is  instantiated  to  the  first  element  of  its  domain,  which  is 
tmaxp-ptll.  This  instantiation  wakes  up  constraint  #1,  which  fails  because 
a  tool-cutting  edge-angle  of  80®  is  no  longer  allowed.  Thus,  for  the  first  time, 
backtracking  is  necessary. 

9.  On  backtracking.  Holder  is  instantiated  to  the  last  remaining  value,  namely 
tmax-ptl2.  This  wakes  up  constraint  #1  and  instantiates  the  TC-Edge-Angle 
variable  to  the  value  60. 


Finally,  we  have  found  an  admissible  pair  (tmaxp-ptl2,  vnmm-71)  consisting 
of  a  holder  and  a  cutting  plate  which  satisfy  the  initial  constraints.  The  solution 


393 


could  be  achieved  by  making  only  two  choices,  including  one  choice  made  by 
backtracking. 

Let  us  now  evaluate  the  program  behaviour  shown  above.  Here,  we  would  like 
to  stress  the  effects  of  using  weak  looking-ahead  rather  than  forward-checking 
or  standard  looking-ahead  for  some  of  the  constraints  in  our  example.  First,  it  is 
clear  that  weak  looking-ahead  yields  much  better  pruning  results  than  forward¬ 
checking.  For  instance,  by  using  a  weak-looking-ahead  (wla)  instead  of  a  forward 
declaration  for  the  constraints  #1  to  #3  in  our  example,  we  could  immediately 
and  largely  restrict  the  domains  of  several  variables,  whereas  if  using  a  forward¬ 
checking  control  we  would  have  had  to  wait  until  one  of  the  constraint  variables 
would  have  been  instantiated. 

Secondly,  we  will  have  to  answer  the  question  as  to  the  advantages  of  weak 
looking-ahead  over  standard  looking-ahead.  By  applying  the  LAIR  not  more 
than  once  for  each  constraint,  the  high  expense  of  re-checking  whether  it  can  be 
applied  at  a  later  stage  of  computation  can  be  avoided.  Thus,  some  work  hais 
to  be  done,  but  it  only  has  to  be  done  once.  This  especially  concerns  constraint 
#3.  After  applying  the  LAIR  to  it,  all  we  have  to  do  in  the  following  is  to 
check  whether  a  value  will  be  assigned  to  any  of  its  variables.  Using  looking- 
ahead,  we  would  have  bad  to  pay  attention  to  each  time  a  value  is  removed  from 
the  domain  of  one  of  the  constraint  variables,  and  do  a  new  looking-ahead  check 
then.  Thus,  applying  the  LAIR  only  once  and  continuing  with  a  forward-checking 
execution  of  the  constraint  allows  the  construction  of  a  sound  and  complete  proof 
procedure,  as  pointed  out  in  section  3.  It  allows  a  combination  of  the  simplicity 
of  forward-checking  with  the  power  of  looking-ahead,  however,  avoiding  the  high 
computational  cost  of  the  latter. 

Finally,  we  would  like  to  consider  when  the  looking-ahead  step  in  WLA  should 
be  done.  In  our  experience,  useful  control  strategies  should  perform  the  step  as 
early  as  possible,  or  do  it  for  a  constraint  as  soon  as  it  has  been  touched  the  first 
time,  i.e.  cis  soon  as  one  of  its  constraint  variables  has  been  restricted. 


5  Conclusions 

Finite  domain  consistency  techniques  like  forward-checking  and  looking-ahead 
contribute  to  make  constraint  logic  programming  an  appropriate  tool  for  ex¬ 
pressing  and  solving  a  rich  class  of  combinatorial  problems.  When  applying  these 
techniques  in  a  real-world  application,  the  need  for  a  combination  of  forward¬ 
checking  and  looking-ahead  came  up.  In  this  paper,  we  have  presented  the  weak 
looking-ahead  technique  which  combines  the  pruning  power  of  looking-ahead  with 
the  efficiency  of  forward-checking.  We  have  shown  that  the  resulting  inference 
rule  is  sound  and  complete.  Finally,  we  have  shown  how  to  use  weak  looking- 
ahead  for  lathe  tool-selection  in  a  CIM  environment.  The  language  FiDo  which 
we  have  used  for  this  application  has  been  developed  at  DFKI  and  will  further 
be  extended  to  support  hierarchically  structured  domains  and  to  make  use  of 
their  structure  for  both  expressing  and  applying  constraints  over  them  [8]. 


394 


Acknowledgements 

The  work  presented  in  this  paper  has  been  done  within  the  ARC-TEC  project 
at  DFKI  and  weis  partly  supported  by  the  German  Federal  Ministry  of  Research 
and  Technology  (BMFT)  under  grant  ITW  8902  C4. 

We  would  like  to  thank  Jane  Bensch,  Hans-Giinther  Hein  and  Eva  Volker  for 
proofreading  earlier  versions  of  this  paper.  The  three  anonymous  referees  also 
provided  useful  comments  on  the  organisation  of  the  paper. 

References 

1.  A.  Bernard!,  H.  Boley,  K.  Hinkelmann,  P.  Hanschke,  C.  Klauck,  O.  Kuhn,  R.  Leg- 
leitner,  M.  Meyer,  M.  M.  Richter,  G.  Schmidt,  F.  Schmalhofer,  and  W.  Sommer. 
ARC-TEC:  Acquisition,  Representation  and  Compilation  of  Technical  Knowledge. 
In  Expert  Systems  and  their  Applications:  Tools,  Techniques  and  Methods,  Avignon, 
France,  1991.  Also  available  as  Research  Report  RR-91-27,  DFKI  GmbH,  P.  O. 
box  2030,  D-6750  Kaiserslautern. 

2.  H.  Boley,  P.  Hanschke,  M.  Harm,  K.  Hinkelmann,  T.  Labisch,  M.  Meyer,  J.  Mul¬ 
let,  T.  Oltzen,  M.  Sintek,  W.  Stein,  and  F.  Steinle.  /iCAD2NC:  A  declarative 
lathe- workplanning  model  transforming  CAD-likc  geometries  into  abstract  NC  pro¬ 
grams.  Technical  Report  D-91-15,  DFKI  GmbH,  P.  O.  Box  2080,  D-6750  Kaisers¬ 
lautern,  November  1991. 

3.  D.  de  Schteye,  D.  Pollet,  J.  Ronsyn,  and  M.  Bruynooghe.  Implementing  Finite- 
Domain  Constraint  Logic  Programming  on  Top  of  a  PROLOG-System  with  Delay- 
mechanism.  In  N.  Jones,  editor,  Proc.  ESOP’90,  pages  106-117,  1990. 

4.  J.  JafFar  and  J.-L.  Lassez.  Constraint  Logic  Programming.  In  Proc.  POPL-87, 
Munich,  Germany,  1987. 

5.  J.  Jaffar,  S.  Michaylov,  P,  Stuckey,  and  R.  Yap.  The  CLP(7l)  Language  and  Sys¬ 
tem.  Technical  Report  CMU-CS-90-181,  School  of  Computer  Science,  Carnegie 
Mellon  University,  Pittsburgh,  October  1990. 

6.  A.K.  Mackworth.  Consistency  in  Networks  of  Relations.  AI  Journal,  8(1):99-118, 
1977. 

7.  P.  Meseguer.  Constraint  Satisfaction  Problems;  An  Overview.  AICOM,  2{\):3-\7, 
1989. 

8.  M.  Meyer.  Using  Hierarchical  Constraint  Satisfaction  for  Lathe- Tool  Selection  in 
a  CIM  Environment.  In  Fifth  International  Symposium  on  Artificial  Intelligence, 
pages  167-177.  AAAI  Press,  December  1992. 

9.  M.  Meyer,  H.-G.  Hein,  and  J.  Muller.  FIDO:  Finite  Domain  Consistency  Tech¬ 
niques  in  Logic  Programming.  In  A.  Voronkov,  editor,  Logic  Programming:  Pro¬ 
ceedings  of  the  1“'  and  2"'*  Russian  Conferences,  volume  592  of  LNAI,  pages  294- 
301.  Springer- Verlag,  Berlin,  Heidelberg,  1992. 

10.  J.  Muller.  Design  and  Implementation  of  a  Finite  Domain  Constraint  Logic  Pro¬ 
gramming  System  based  on  PROLOG  with  Coroutining.  Diploma  thesis.  Com¬ 
puter  Science  Department,  University  of  Kaiserslautern,  1991.  Also  available  as 
Technical  Report  D-91-02,  DFKI  GmbH,  P.  O.  Box  2080,  D-6750  Kaiserslautern. 
12.  P.  van  Hentenryck.  Constraint  Satisfaction  in  Logic  Programming.  MIT  Press, 
Cambridge,  1989. 


Should  Decision  Trees  be  Learned  from  Examples 
or  from  Decision  Rules? 

Ibrahim  F.  Imam  and  Ryszard  S.  Michalski 

Center  for  Anifictai  Intelligence 
George  Mason  University 
iimam@aic.ginu.edu  &  michalski@aic.gmu.edu 

ABSTRACT 

A  standard  method  for  determining  decision  trees  is  to  leant  them  from  examples.  A  disadvantage 
of  this  approach  is  that  once  a  decision  tree  is  learned,  it  is  difficult  to  modify  it  to  suit  different 
decision  making  situations.  An  attractive  qiproach  that  avoids  this  problem  is  to  learn  and  store 
knowledge  in  a  declarative  form,  e.g.,  as  decision  rules,  and  then,  whenever  needed,  generate 
from  it  a  decision  tree  that  is  most  suitable  in  any  given  situation.  This  paper  describes  an  efficient 
method  for  this  purpose,  called  AQDT-1,  which  takes  decision  rules  generated  by  the  learning 
system  AQIS  and  builds  from  them  a  decision  tree  optimized  according  to  a  given  quality 
criterion.  The  method  is  able  to  build  conventional  decision  trees,  as  well  as  the  so-called  “skip 
noder**  uees,  in  which  measuring  attributes  assigned  to  some  nodes  may  be  avoided.  It  is  shown 
that  "skip-node"  Uees  can  be  significantly  simpler  than  conventional  ones.  In  the  experiments 
comparing  AQOT-1  with  C4.5,  the  former  outperformed  the  latter  both  in  terms  of  the  predictive 
accuracy  as  well  as  the  simplicity  of  the  generated  decision  Uees. 

Key  wordt;  machine  learning,  inductive  learning,  decision  trees,  decision  rules. 

1.  Introduction 

Methods  for  learning  decision  trees  from  examples  have  been  quite  popt  lar  in  machine 
learning  due  to  their  simplicity.  Decision  trees  built  this  way  can  be  quite  efficient,  as 
long  as  the  decision  making  situations  for  which  they  are  optimized  remain  relatively 
stable.  Problems  arise  when  these  situations  change,  or  the  assumptions  under  which  the 
tree  was  built  do  not  hold.  For  example,  in  some  situations  it  may  be  very  difficull  to 
determine  the  value  of  a  certain  attribute  on  the  path  from  the  root.  One  would  like  to 
avoid  measuring  this  atuibute,  and  still  be  able  to  classify  the  example.  If  the  cost  of 
measuring  of  various  attributes  changes,  it  is  desirable  to  resUucture  the  tree  so  that  the 
“inexpensive”  attributes  are  evaluated  first  A  restructuring  is  also  desirable  if  there  is  a 
significant  variation  in  the  frequency  of  occurrence  of  examples  from  different  classes. 
Restructuring  a  decision  tree  is,  however,  difficult  because  the  tree  represents  a  form  of 
procedural  knowledge. 

An  attractive  alternative  that  avoids  the  above  problems  is  to  learn  and  store  knowledge 
in  the  form  of  decision  rules,  and  to  generate  from  them  an  appropriate  decision  tree 
“dynamically,”  as  needed.  Elecision  rules  represent  knowledge  dcclaratively,  and  thus 
do  not  impose  any  order  on  the  evaluation  of  the  attributes.  Since  the  number  of  rules  is 
typically  much  smaller  than  the  number  of  examples,  generating  decision  trees  from 


396 


rules  can  potentially  be  very  fast.  This  way.  one  can  always  generate  a  tree  that  is 
tailored  to  the  specific  decision  situation.  For  example,  one  may  be  able  to  generate  a 
decision  tree  that  avoids  evaluating  an  attribute  that  is  difficult  or  impossible  to 
measure,  or  a  decision  tree  that  fits  well  some  particular  distribution  of  the  decision 
classes.  In  some  situations,  it  may  be  unnecessary  to  generate  a  complete  decision  tree, 
but  instead  only  the  part  with  the  leaves  associated  with  the  decision  classes  of  interest. 
The  proposed  approach  would  generate  only  the  desirable  part.  This  could  be 
interpreted  as  a  generation  of  “virtual”  decision  uees. 

A  disadvantage  of  this  approach  is  that  it  requires  determining  decision  rules  first  There 
are,  however,  very  efficient  methods  for  generating  decision  rules.  Also,  the  rules  need 
to  be  generated  only  once,  and  then  can  be  used  many  times  for  gcnc.aiing  any  type  of 
decision  tree  according  to  various  decision  making  situation. 

This  paper  presents  a  simple  and  efficient  method  for  generating  decision  uees  from 
decision  rules.  The  method  employs  the  AQ  algorithm  for  generating  rules.  It  also 
reports  results  from  experiments  comparing  it  with  a  well-known  method  for  learning 
decision  trees  from  examples,  implemented  in  the  C4.5  program. 

2.  Generating  Decision  Trees  from  Decision  Rules;  AQDT-1 

Decision  trees  are  normally  generated  from  examples  of  decisions.  The  essential  aspect 
of  any  method  for  this  purpose  is  the  attribute  selection  criterion  that  is  used  for 
choosing  attributes  to  be  assigned  to  the  nodes  of  the  tree.  Among  well-known  criteria 
are  the  entropy  reduction  measure  (e.g.,  Quinlan.  1979),  the  gini  index  of  diversity 
(Breiman,  et  al.,  1984),  and  others  (Cestnik  &  Karalic,  1991;  Mingers,  1989). 

The  first  algorithm  for  generating  decision  trees  from  examples  was  proposed  by  Hunt, 
Marin  and  Stone  (1966).  This  algorithm  was  subsequently  modified  by  Quinlan  (1979, 
1983),  and  then  improved  and/or  applied  by  many  authors  to  a  variety  of  learning 
problems  (e.g.,  Quinlan,  1983;  Breiman,  et  al.,  1984).  Later,  Quinlan  (1986),  and  Bratko 
and  Kononenko  (1987)  added  ’’tree  pruning”  procedures  to  handle  data  with  noise. 

In  contrast  to  the  above,  the  proposed  method  generates  decision  uces  from  decision 
rules.  The  method,  called  AQDT-1  (AQ-based  Decision  Trees),  assumes  that  decision 
rules  are  generated  from  examples  by  the  inductive  learning  system  AQIS  (Michalski  et 
al.,  1986).  AQDT-1  uses  an  attribute  selection  criterion  that  is  based  on  the  properties  of 
the  rules,  rather  than  the  properties  of  the  training  examples.  One  rule  may  describe  a 
large  number  of  examples.  Decision  rules  arc  more  powerful  knowledge  representation 
than  decision  uces,  because  they  can  directly  represent  any  description  in  disjunctive 
normal  form,  while  the  latter  can  represent  directly  only  a  disjunctive  normal  form  in 
which  all  conjunctions  arc  mutually  di.sjoint.  Therefore,  when  uansforming  a  set  of 
arbitrary  decision  rules  into  a  decision  tree,  one  faces  an  additional  problem  of  handling 
logically  intersecting  rules  (conjunctions). 


397 


The  proposed  method  for  solving  the  first  problem,  i.e.,  choosing  attributes  on  the  basis 
of  the  properties  of  rules,  employs  a  measure  of  “utility”  of  an  attribute  for  reducing  a 
given  set  of  rules  and  some  other  criteria.  The  second  problem  (handling  non  disjoint 
rules)  can  be  resolved  by  introducing  additional  nodes  in  the  decision  U’ee,  or  by 
assigning  a  “Don't  care"  value  to  some  branches.  This  value  serves  as  a  connection 
edge  between  subtrees  corresponding  to  non-disjoint  rules.  The  branch  with  such  a 
“value"  is  traversed  when  one  does  not  know,  or  wanu  to  ignore  the  value  of  the 
attribute  assigned  to  the  node  from  which  it  stems.  Decision  trees  that  have  nodes  with 
"don’t  care"  values  are  called,  for  short,  “skip-node"  trees. 

The  input  to  the  program  AQDT-1  consists  of  rules  generated  by  AQlS.  Each  such  rule 
represents  a  conjunction  of  conditions.  AQDT-1  creates  a  data  structure  for  each 
concept  description  (a  set  of  rules).  This  structure  has  fields  such  as  the  number  of  rules, 
the  number  of  conditions  in  each  rule,  and  the  number  of  atuibutes  in  the  rules.  The 
system  also  creates  an  array  of  attribute  descriptions.  Each  attribute  description  contains 
the  attribute's  name,  domain,  type,  the  number  of  legal  values,  a  list  of  the  values,  the 
number  of  rules  that  contain  that  attribute,  and  values  of  that  attribute  for  each  rule.  The 
attributes  are  arranged  in  the  array  in  the  a  lexicographic  order,  first  in  the  descending 
order  of  the  number  of  rules  that  contain  that  attribute,  and  second,  in  the  ascending 
order  of  the  number  of  the  attribute's  legal  values.  AQDT-1  constructs  a  decision  tree 
from  decision  rules  by  recursively  selecting  the  “best"  atuibutc  at  a  given  step,  and 
assigning  it  to  the  new  node.  The  difference  between  building  a  decision  tree  from  rules 
and  building  it  from  examples  is  that  in  the  former,  attributes  are  selected  on  the  basis  of 
the  role  the  attributes  play  in  the  rules,  rather  than  on  the  basis  of  the  coverage  of 
examples,  as  in  learning  from  examples. 

The  criterion  for  attribute  selection  should  reflect  the  decision  making  situation.  For 
example,  if  measuring  different  attributes  involves  different  costs,  then  these  costs 
should  be  included  in  the  criterion.  If  some  decision  cla.sses  occur  much  more  frequently 
than  others,  the  criterion  should  favor  measuring  the  attributes  that  occur  in  the  rules  for 
these  classes.  To  be  able  to  compare  the  AQDT-1  method  with  the  C4.5  program,  we 
assume  here  a  selection  criterion  that  is  oriented  toward  producing  trees  with  the 
minimum  number  of  nodes.  This  criterion  is  composed  of  three  elementary  criteria;  1) 
the  total  attribute  utility;  2)  the  number  of  different  attribute  values  in  the  rules;  and  3) 
the  number  of  rules  that  contain  the  given  attribute.  The  total  attribute  utility  is  the  sum 
of  the  class  utilities— the  utilities  of  the  attribute  for  each  decision  class.  Suppose  that 

decision  classes  are  Cl,  C2,...,  Cm.  Suppose  further  that  VI,  V2, . .  Vn  are  sets  of 

values  of  some  attribute  A  that  occurr  in  rules  for  classes  Cl,  C2,  ...Cm,  respectively. 
The  utility  of  A  for  Class  Ci,  U(A,  Ci),  is  the  number  of  sets  Vj,  (j=l,...,m,  and  ) 
that  are  disjoint  with  Vi.  plus  1.  The  total  utility,  U(A),  of  the  attribute  A  is; 

m 

U(A)=  IU(A,Ci) 
i=l 


(1) 


398 


The  total  utility  of  an  attribute  is  the  highest  (m^)  when  the  attribute  occurs  in  every 
class  description,  and  has  a  different  value  in  each  of  them. 

The  second  criterion  prefers  attributes  that  have  fewer  values  in  the  rules,  because 
nodes  that  are  assigned  such  attributes  will  have  a  smaller  fan  out  (in  the  case  of 
continuous  attributes,  they  are  quantized,  and  their  new  “values”  are  ranges  of  original 
values).  The  third  criterion  prefers  the  attribute  that  occurs  in  a  larger  number  of  rules, 
because  this  can  help  to  evaluate  all  the  rules  faster. 

These  three  criteria  are  combined  into  one  attribute  ranking  measure  using  the 
“lexicographical  evaluation  functional  with  tolerances”  (LEF;  Michalski,  1973).  First, 
the  method  evaluates  attributes  on  the  basis  of  their  utility.  If  two  or  more  attributes 
share  the  same  top  score  or  their  scores  are  within  the  assumed  tolerance  range,  the 
method  evaluates  these  attributes  using  the  second  criterion  (other  attributes  are 
ignored).  If  again  two  or  more  attributes  share  the  same  top  score,  or  their  scores  are 
within  the  tolerance  range,  then  the  third  criterion  is  used.  If  there  is  still  a  tie,  the 
method  selects  the  “best”  attribute  randomly. 

The  second  problem  of  forming  decision  trees  from  decision  rules  is  how  to  start  from  a 
single  root  and  represent  all  the  rules  in  a  decision  tree,  even  in  cases  when  some  rules 
do  not  contain  common  attributes.  One  way  of  overcoming  with  this  problem  is  to 
create  additional  nodes  corresponding  to  "missing"  attributes.  This  method  can 
significantly  increase  the  size  of  the  tree.  Another  approach,  adopted  in  AQDT- 1 ,  is  to 
create  a  “Don’t  care"  value  for  attributes  that  may  not  be  necessary  to  evaluate. 

The  following  simple  example  illustrates  llie  AQDT-1  method.  Suppose  there  arc  three 
decision  classes,  C1,C2  &  C3,  described  by  the  following  AQ15-dcrived  DNF 
expressions: 

Cl  <=  [xl=3]&(x2=2]  v  [xl=3J&Ix3=l  v3)&[x4=l) 

C2  <=  (xl=l  V  2]&[x2=3  V  4]  v  [xl=2]  &[x3=l  v  2]  &(x4=2] 

C3  <=  [xl=l]&[x2=l]  v  [xl=4]&[x3=2v3]&{x4=3] 

The  method  turns  such  descriptions  into  elementary  rules,  which  have  only  one  attribute 
value  in  each  condition  (no  internal  disjunction).  This  is  done  by  "multiplying  out"  each 
conjunction  in  the  DNF  expressions. 

Figure  1  illustrates  the  process  of  selecting  an  attribute  for  a  node  in  the  tree  based  on 
the  set  of  elementary  rules.  The  rows  “Values  in  Ci”  list  values  of  the  attribute  in  the 
rules  of  the  decision  class  Ci.  “Value  sharing”  indicates  whether  the  value  of  that 
attribute  in  the  rule  of  a  given  class  is  or  is  not  present  in  rules  of  other  classes. 
Attributes  x2  and  x4  both  have  total  utilities  of  9.  Therefore,  they  arc  evaluated  on  the 
second  criterion  (the  number  of  attribute  values  in  rules).  Attribute  x4  has  fewer  values 
(3)  than  x2  (4),  therefore  it  is  assigned  to  the  root  of  the  tree.  In  the  next  step,  AQDT-1 
modifies  its  data  structure  to  eliminate  the  rules  containing  x4  from  all  the  concept 
descriptions,  and  marks  x4  as  an  “in_trce”  attribute.  This  process  is  repeated  until  all 


399 


rules  are  eliminated.  In  this  example,  (here  will  be  two  iterations.  The  second  attribute 
to  be  chosen  is  x2  and  a  don't  care  value  will  be  added  to  x4  to  merge  the  tree. 


Concept 

Attributes  I 

xl 

x2 

x3 

x4 

cr . 

(3  rules) 

Values  in  Cl 

3 

2 

1.3 

1 

Val.  sharing 

No 

No 

Yes 

No 

3 

3 

1 

3 

a 

(6nde$) 

DSSSBIQH 

1.2 

EflSH 

1  v2 

2 

Val.  sharing 

Yes 

No 

Yes 

No 

2 

3 

1 

3 

C3 

(3  roles) 

mm 

1 

mm 

3 

Yes 

No 

Yes 

No 

2 

3 

1 

3 

\  Total  Attribute  Utility 

7 

9 

3 

9 

Figure  1;  Determining  the  total  autility  of  the  attributes. 


3.  Analyzing  Decision  Trees  Obtained  by  AQDT-1 

The  performance  of  the  AQDT-1  method  was  evaluated  by  applying  it  to  several 
learning  problems.  The  most  complex  problem  was  to  learn  a  decision  rules  for 
characterizing  the  voting  records  in  the  US  Congress.  Each  voting  record  was  described 
in  terms  of  19  multivalued  attributes.  AQDT-l,  when  run  in  the  conventional  mode 
mode  (no  "skip-nodes")  and  with  the  attribute  selection  criterion  minimizing  the  number 
of  nodes,  produced  a  decision  uee  with  20  nodes,  and  91.8%  predictive  accuracy  on  the 
testing  examples.  For  comparison,  the  well-known  decision  tree  learning  program, 
C4.5,  was  also  run  on  the  exactly  the  same  data.  C4.5  produced  a  decision  tree  with  22 
nodes,  and  85.7%  predictive  accuracy.  (Both  programs  were  assumed  to  produce  a 
complete  and  consistent  decision  tree  with  regard  to  the  training  examples,  i.e.,  a 
decision  tree  that  gives  100%  correct  recognition  of  the  training  examples).  AQDT-1 
was  also  run  in  the  “skip-node”  mode.  As  shown  in  Figure  2,  the  obtained  "skip-node" 
decision  tree  has  only  13  nodes.  If  the  value  of  “Foud_siamp_cap”  is  0  or  1  then  one 
can  make  the  decision  “Democrat"  or  "Republican”,  respectively.  The  branch  “Don’t 
care"  stemming  from  “Food_stamp_cap”  allows  one  to  proceed  to  the  next  node  (i.e., 
evaluate  the  next  attribute  “Occupation”)  without  knowing  the  value  of 
“Food_siamp_cap”. 

The  idea  of  “Don’t  care”  branches  (or  "skip  nodes")  allows  one  to  build  a  decision  tree 
from  rules  that  do  not  intersect  logically.  Introducing  such  "Don't  care"  branches  not 
only  makes  the  resulting  "skip-node"  decision  tree  simpler,  but  also  may  allow  one  to 
reach  a  decision  when  values  of  some  attributes  arc  unknown.  This  way,  a  "skip-node" 


400 


decision  tree  makes  possible  in  some  situations  to  avoid  measuring  an  attribute  when  it 
is  not  logically  necessary,  which  is  a  problem  with  conventional  decision  trees 
(Michalski,  1990). 

Let  us  define  the  accuracy  of  a  decision  tree  against  a  set  of  examples  as  the  percentage 
of  examples  that  are  correctly  classified  by  the  tree  out  of  the  total  number  of  examples. 
Wc  will  distinguish  between  two  types  of  accuracy;  the  level  accuracy  and  the 
accumulative  accuracy.  The  level  accuracy  of  a  leaf  node  is  the  percentage  of  the 
correctly  classified  examples  at  this  node.  The  level  accuracy  at  a  non-leaf  node  is  the 
percentage  of  the  number  of  examples  that  are  not  classified  to  any  class  at  that  node. 
The  accumulative  accuracy  of  a  leaf  is  the  total  percentage  of  the  correctly  classified 
examples  at  this  point,  including  those  classified  coneclly  at  higher  nodes. 

In  Figure  2,  L  denotes  the  level  accuracy,  and  A  denotes  the  accumulative  accuracy  at  a 
given  node.  For  example,  the  leaf  node  “Republican"  at  the  first  level  of  the  tree  has 
level  accuracy  of  13/20  (65%).  The  node  “Republican”  at  subsequent  levels  has  the 
level  accuracy  15/20  (75%),  19/20  (95%),  and  11/20  (55%),  respectively.  In  contrast, 
the  accumulative  accuracy  of  the  "Republican"  nodes  at  different  levels  is  13/20  (65%), 
18/20  (90%),  20/20  (100%),  and  20/20  (100%).  The  accumulative  accuracy  at  the 
second  level  was  computed  by  adding  the  13  examples  classified  correctly  at  level  one, 
and  5  examples  classified  correctly,  out  of  the  13  unclassified  examples  at  the  first  level. 


:emir%a 


emocrat  l=25/31 


Republican  L=  13/20 


_ _ ^A=27/31 


Democrat  L=8/31 


0 


Republican  L=  1 8/2011  Democrat  L=20/31|  [Republican  l=1/20 


A=20/20 


Democrat  L=  10/31  Republican  L=  11/20 


Figure  2:  AQDT-1  derived  "skip-node"  decision  tree  for  the  US  Congress  Voting  data. 


Let  us  explain  how  the  AQDT-1 -generated  tree  classifies  testing  examples.  First,  it 
creates  an  initial  hypothetical  conclusion  by  matching  the  first  subtree,  which  consists 
of  the  root  and  its  leaves  (in  fig.  2),  then,  it  confirms  or  conuadicts  this  hypothesis 
through  the  other  subtrees  (through  Don’t  care  nodes).  AQDT-1  uses  the  level  accuracy 
to  choose  a  conclusion  in  cases  where  there  is  a  contradiction  with  the  initial  hypothesis. 
The  testing  example  (Food_stamp_cap=0,  Occupation=0,  Cas_cont_ban=0,  income=2) 
or  (0,0,0,2)  will  assign  an  initial  conclusion  “Democrat”  with  accuracy  80.6%.  During 
the  confirmation  process,  wc  notice  a  contradiction  from  the  second  and  third  subtrees. 


401 


The  percentages  for  these  subtrees  are  75*13/51=19.1%  and  95*6/51=  11.1%, 
respectively.  In  the  last  path,  there  is  no  value  2,  we  consider  such  case  against  the 
initial  conclusion  with  percentage  (55*2/51=2.2%).  If  the  subtraction  (80.6- 
32.4=48.2%)  is  less  than  50%  (in  case  of  two  classes),  the  final  conclusion  is  the 
contradictory  one.  Od  er  wise,  and  also  in  cases  where  there  are  some  subtrees  that 
support  the  initial  conclusion,  the  following  method  is  used.  Assume  the  example, 
(0,0,1, 1)  where  the  third  subtree  only  supports  the  initial  conclusion.  Based  on  the 
information  in  Figure  2,  we  assume  that  the  level  accuracy  of  each  contradictory  path  is 
100%,  and  we  consider  the  contradictory  conclusion  as  initial  one,  then  compare  the 
contradictory  percentage  for  each  conclusion.  In  our  example,  consider  the  level 
accuracy  of  “Occupation”  as  100%.  At  that  level,  there  are  15  Republican  and  8 
Democrat  examples  are  classified  correctly.  There  are  28  (28*100/51=54.9%) 
unclassified  examples.  A  54.9%  will  be  the  level  accuracy  of  “Food_stamp_cap”,  and 
(54.9*25/31=44.7%)  is  the  contradictory  percentage  to  the  class  Republican.  That 
means  if  the  correct  concept  is  Republican,  the  contradictory  percentage  is  44.7%  from 
the  first  subtree,  but  if  it  is  Democrat,  the  contradictory  percentage  is  19.1%  from  the 
second  subtree.  In  this  example,  the  contradictory  percentage  of  the  "Income"  subtree  is 
also  less  than  44.7%. 

4.  Decision  Tree  Pruning 

When  input  data  may  have  errors  (nois'),  it  is  often  useful  to  prune  the  obtained 
decision  tree.  Such  pruning  protects  the  tree  from  overfilling.  Various  approaches  have 
been  described  in  (Mingers,  1989;  Breiman,  etal,  1984;  Quinlan,  1986, 1987;  Nibletl  & 
Bratko,  1986;  Cestnik  et  al,  1987;  Clark  &  Nibletl,  1987;  Smyth  et  al,  1990;  Ccstnik  & 
Bratko,  1991).  These  pruning  approaches  differ  in  their  use  of  various  criteria  to  decide 
whether  or  not  to  prune  at  a  certain  stage. 

In  this  paper,  we  will  study  the  meaning  and  effect  of  pruning  on  a  decision  tree  learned 
from  rules.  Pruning  the  decision  tree  will  be  simpler  in  terms  of  where  to  prune  because 
of  the  structure  of  the  learned  decision  tree.  The  meaning  of  removing  one  node  of  the 
decision  tree  is  equivalent  to  removing  all  the  alternative  conjunctions  (conjunctions 
which  contain  the  attribute  and  its  value)  from  the  rules.  As  can  be  seen,  the  meaning  of 
pruning  here  is  very  crucial  because  of  our  assumption  that  the  rule-base  is  complete 
(AQ15  guarantees  100%  match  with  the  training  examples).  Our  pruning  approach  takes 
into  consideration  that  the  decision  tree  is  learned  from  rules,  not  from  examples.  The 
pruning  strategy  is  based  on  pruning  whole  nodes  al  the  lowest  level  with  their  Don’t 
care  node.  Figure  3  shows  the  learned  decision  uee  from  the  Congressional  Voting 
rules.  The  pruning  lakes  place  at  the  dotted  lines,  and  the  numbers  on  the  left  represent 
the  accuracy  after  exchanging  the  Don’t  care  node  with  the  associated  decision. 


402 


Don’t 


(Rep- 84.;^ 

(Dem-78. 

(Rep-92.1%)^^^S^^ES 

(Dem-92.r^\_ 

Donicjjn^^ 

SfSBotnS 


Figure  3:  A  "skip-node**  decision  tree  and  various  pruning  points  at  the  dotted  lines. 


5.  Comparing  Decision  Trees  from  AQDT-1  and  C4.5 

This  section  presents  an  experiment  comparing  pruned  decision  aces  learned  from  rules 
by  AQDT-1  with  those  learned  from  examples  by  C4.5.  Both  programs  were  applied  to 
the  same  data  on  US.  congressional  voting,  described  above.  The  results  presented  here 
(for  all  experiments)  are  the  best  results  from  running  C4.S,  with  both  default  windows 
(10  trials;  the  max.  of  20%  and  twice  the  square  root  of  the  number  of  examples)  and 
100%  windows  (10  trials).  C4.5  created  a  ace  with  23  nodes  before  pruning.  After 
pruning,  the  aee  had  7  nodes  with  error  rate  1 1 .8%. 

Figure  4  shows  a  comparison  between  the  accuracy  of  the  C4.5  and  AQDT-1  decision 
aees  with  different  degrees  of  pruning.  The  comparison  is  done  on  the  aaining 
examples  (fig.  4a)  and  testing  examples  (fig.  4b)  examples.  From  Figure  4,  we  can  sec 
that  aees  that  were  learned  from  rules  match  the  examples  better  than  those  aees  that 
were  learned  from  examples  with  different  degrees  of  pruning. 


too 


% 
H 

V 

2  76 

3  M 
^  (0 


* 

I 

MM 

•-1 

“ 

vS:: 

SIB 

iw. 

J 

X 

LZ. 

! 

1 

hi 

1 '  f '  1 

m 

4  6  6  7  «  9  10  U  U  13  14  IS  16 

Number  of  nodes  in  the  free 


4  S  6  7  0  9  in  11  17  n  14  IS  16 

Number  of  nodes  in  the  aee 


F/gure  4:  The  increase  of  the  accuracy  of  AQDT-1  and  C4.5  generated  aees  with  the 

number  of  nodes. 


The  rest  of  this  section  describes  two  experiments  comparing  AQDT-1  with  C4.5.  The 
experiments  were  done  to  show  the  effect  of  the  size  of  source  data  on  AQDT-1  and 
C4.5  in  terms  of  number  of  nodes  and  accuracy.  In  this  experiment,  the  accuracy  is 
calculated  against  the  testing  examples.  The  first  experiment  involved  the  second 
congressional  voting  data  set,  which  consisted  of  112  examples,  two  concepts  to  be 
learned,  and  102  testing  examples.  Figure  5  shows  the  dependency  of  accuracy  and 
number  of  nodes  on  a  set  of  different  relative  size  of  the  training  examples.  The  second 
experiments  used  the  so-called  MONKl  data,  consists  of  124  training  examples,  two 


403 


concepts,  and  432  testing  examples.  Figure  6  shows  the  dependence  of  the  accuracy  and 
the  number  of  nodes  on  a  set  of  different  relative  size  of  the  training  examples. 


Relative  die  of  training  data  (%)  Relative  die  of  training  data  (%) 

Figure  5  :  Comparing  decision  trees  for  the  US  Congressional  Voting  problem. 


Relative  die  of  training  data  (%)  Relative  die  of  training  data  (%) 

Figure  6:  Comparing  decision  trees  for  the  MONKl  problem. 


Figures  5  &  6  show  that  AQDT-1  produced  decision  uees  with  fewer  nodes  and  a 
higher  accuracy  than  C4.5. 

6.  Conclusion 

The  paper  presented  the  AQDT-1  method  for  efficiently  determining  decision  trees 
from  decision  rules.  The  main  difference  between  determining  trees  from  decision  rules 
and  determining  them  from  examples  is  in  the  attribute  selection  function.  In  the  former 
case,  the  attribute  selection  criterion  evaluates  the  role  of  attributes  in  the  rules,  while  in 
the  latter  it  evaluates  the  coverage  of  traninig  examples.  The  primary  property  used  in 
the  proposed  method  is  the  “total  utility”  of  an  attribute  for  reducing  the  decision  rules. 

The  method  employs  the  AQ15  inductive  learning  program  for  learning  decision  rules 
from  examples.  A  major  advantage  of  the  proposed  approach  is  that  it  assists  in 
determining  decision  trees  suitable  for  different  decision  making  situations,  for 
example,  when  one  wants  to  avoid  measuring  some  “expensive”  atuibutc.  Another 
advantage  is  that  the  method  can  build  decision  bees  with  “Don’t  care  values"  on  some 
branches.  It  has  been  demonstrated  that  decision  trees  with  such  “Don’t  care  values”  can 
be  significantly  simpler  than  conventional  decision  trees.  In  the  experiments,  the 
AQDT-1  method  outperformed  the  C4.5  method  for  learning  decision  trees  from 
examples,  both  in  terms  of  the  predictive  accuracy  and  the  simplicity  of  the  generated 
decision  trees. 


404 


ACKNOWLEDGMENTS 

The  authors  thank  M.  Hieb,  K,  Kaufman.  J.  Wnck.  M.  Maloof,  H.  Vafaie  and  E. 
BloedtMii  for  valuable  comments  and  suggestions. 

This  research  was  supported  in  part  by  the  National  Science  Foundation  under  grant  No. 
IRI-9020266,  in  part  by  the  Defense  Advanced  Research  Projects  Agency  under  the 
grant  No.  N00014-91-J-1854,  administered  by  the  Office  of  Naval  Research,  and  the 
grant  No.  F49620-92-J-0549,  administered  by  the  Air  Force  Office  of  Scientific 
Research,  and  in  part  by  the  Office  of  Naval  Research  grant  No.  N00014-9I-J-1351. 

REFERENCES 

Bratko,  1.  &  Lavrac,  N.  (Eds.),  Progress  in  Machine  Learning,  Sigma  Wilmslow, 
England,  Press,  1987. 

Bratko,  I.  &  Kononenko,  L.  “Learning  Diagnostic  Rules  from  Incomplete  and  Noisy 
Data,"  Interactions  in  AI  and  statistics,  B.  Phelps,  (edt.),  Gower  Technical  Press,  1987 
Breiman,  L.,  Friedman,  J.H.,  Olshen,  R.A.  &  Stone,  C.J.,  "Classification  and 
Regression  Trees,“,Belmont,  California;  Wadsworth  Int.  Group,  1984. 

Cestnik,  B.  &  Karaite,  A.,  "The  Estimation  of  Probabilities  in  Attribute  Selection 
Measures  for  Decision  Tree  Induction",  Proceeding  of  the  European  Summer  School  on 
Machine  Learning,  July  22-31,  Priory  Corsendonk,  Belgium,  1991. 

Clark,  P.  &  Niblett,  T.  "Induction  in  Noisy  Domains,”  Progress  in  Machine  Learning, 
I.  Bratko  and  N.  Lavrac,  (Eds.),  Sigma  Press,  Wilmslow,  1987. 

Hunt,  E.,  Marin,  J.  &  Stone,  P.,  Experiments  in  induction,  NY:  Academic  Press,  1966. 
Michalski,  R.S.  "AQVAL/1 -Computer  Implementation  of  a  Variable-Valued  Logic 
System  VLI  and  Examples  of  its  Application  to  Pattern  Recognition,”  Proceeding  of  the 
First  International  Joint  Conference  on  Pattern  Recognition,  pp.  3-17, 1973. 

Michalski,  R.S,,  Mozetic,  I.,  Hong,  J.  &  Lavrac,  N.,  The  Multi-Purpose  Incremental 
Learning  System  AQ15  and  Its  Testing  Application  to  Three  Medical 
Domains.”/’rocecd»V?3.«’  of  AAAI-86,  Philadelphia,  PA,  1986. 

Michalski,  R.S.,  "L-’^ar;"<ng  Flexible  Concepts:  Fundamental  Ideas  and  a  Method  Based 
on  Two-tiered  Reprej:ntation,”A/acA//jf  Learning:  An  Artificial  Intelligence  Approach, 
Vol.  Ill,  Y.Kodratoff  &  R.S.Michalski  (Eds.),  Morgan  Kaufmann.  pp.  63-11 1, 1990. 
Mingers,  J.,  "An  Empirical  Comparison  of  selection  Measures  for  Decision-Tree 
Induction,"  Maclune  Learning,  pp.3 19-342,  Vol.  3,  No.4,  Kluwer  Academic  Pub.,  1989. 
Niblett,  T.  &  Bratko,  I.,  "Learning  Decision  Rules  in  Noisy  Domains,"  Proceeding 
Expert  Systems  86,  Brighton,  Cambridge;  Cambridge  University  Press,  1986. 

Quinlan,  J.R.,  "Discovering  Rules  By  Induction  from  Large  Collections  of  Examples,” 
Expert  Systems  in  the  Microelectronic  Age,  Ed.  D.  Michie,  Edinburgh  Unv.  Press,  1979. 
Quinlan,  J.R.,  "Learning  Efficient  Classification  Procedures  and  Their  Application  to 
Chess  End  Games,"  R.S.  Michalski,  J.G.  Carbonell  and  T.M.  Mitchell,  (Eds.),  Machine 
Learning:  An  Artificial  Intelligence  Approach.  Los  Altos:  Morgan  Kaufmann,  1983. 
Quinlan,  J.R.,  "Induction  of  Decision  Trees  f  Machine  Learning  Vol.  1,  No.  1,  Kluwer 
Academic  Publishers,  1986. 

Smyth,  P.,  Goodman,  R.M.  &  Higgins,  C.,  "A  Hybrid  Rule-based/Bayesian 
Classifier,"  Proceedings  of  ECAI 90,  Stockholm,  August,  1990. 


Integrating  Machine>Learning  Techniques  in 
Knowledge-Based  Systems  Verification 


Hakim  Lounis 

Laboraioiie  de  Recherche  en  InfonnatiqiK 
University  de  Paiis-Sud,  Bitiment  490 
91405  Orsay  Cedex,  FRANCE 
email :  lounis@lii.lii.fr,  lel :  (33)  69  41  64  09 


Abstract,  a  ligniHcant  problem  in  (he  development  of  Rnowledge-Bised  Systems  (KBS)  is 
its  verification  step.  This  paper  describes  an  expert  system  veriGcation  approach  that  considers 
system  specifications,  and  consequently.  Knowledge  Bases  (KB)  to  be  partially  described  when 
development  starts.  This  partial  description  is  not  necessarily  perfect  and  our  work  aims  at  using 
Machine  Learning  techniques  to  progressively  improve  the  quality  of  expert  system  Knowledge 
Bases,  by  cc,.ing  with  two  major  KB  anomalies  :  incompleteness  and  incorrecmess.  In  agreement 
with  the  current  tendency,  KBs  considered  in  our  approach  are  expressed  in  different  formalisms. 
Results  obtained  with  two  different  learning  algorithms,  confirm  (he  hypothesis  that  integrating 
machine  learning  techniques  in  the  verification  step  of  a  Knowledge-Based  System  life  cycle,  is  a 
promising  approach. 

Keywords.  Verification,  Formal  Specifications,  Machine  Learning,  Revision  Process, 
Production  Rules,  Semantic-Net,  Integrity  ConstrainL 


1.  Introduction 

Nowadays  there  still  are  few  commercialized  Knowledge-Based  Systems  (KBS).  The  major  reason 
is  the  lack  of  a  strict  validation  step  in  their  life  cycle.  There  is  a  widespread  agreement  that  KBSs 
cannot  be  designed  in  a  linear  fashion.  This  is  due  to  the  typical  problems  they  have  to  resolve;  they 
require  then  to  adapt  traditional  techniques  of  software  development  or  to  use  new  techniques  relevant 
to  Artificial  Intelligence  (Al)  systems.  For  instance,  the  life  cycle  of  Figure  1-a  does  not  seem  to  be 
advisable  for  KBS  developmenu  The  model  shown  in  Figure  1-b  allows  the  developer  to  partially 
describe  KBS  specifications;  These  specifications  will  be  progressively  completed  through  each  new 
cycle. 


Figure  I :  Different  life  cycles 


To  avoid  confusion  due  to  lack  of  unified  terminology,  we  adopt  in  this  paper  Laurent's 
terminology  [1]  for  validation  process.  We  use  different  concepts  for  validation  purposes.  Some  are 
formalizable  (e.g.,  circularity  of  a  rule  base,  redundancy  of  a  rule  base,  etc),  and  some  are  not  (e.g., 
level  of  performances,  explanation  capabilities,  etc).  From  those  concepts  we  may  set  up 
specifications,  always  in  a  forma)  way.  But  we  obtain  either  really  formal' specifications  (with  the 


406 


formalizable  concepts)  or  what  (1]  calls  pseudo-formal  specifications.  For  example,  to  deal  with 
explanation  capahiliiies.  we  can  define  a  formal  process:  an  ad-hoc  questionnaire  will  .iUed  by 

ten  users;  each  answer  will  give  points,  and  a  formal  process  of  aggregation  will  produce  a  final  note 
N  in  a  given  scale  (e.g.,  [0  ..  100]).  If  we  choose  a  t^shold,  e.g.,  80,  then  we  can  set  up  a  pseudo- 
formal  specification:  N  i  80.  This  leads  to  the  following  definitions: 

Definition  1 :  A  validation  process  is  a  process  that  attempts  to  determine  whether  a  KBS  satisfies 
or  not  one  of  its  specifications. 

Definition  2  :  A  verification  process  is  a  validation  process  that  attempts  to  determine  whether  a 
KBS  satisfies  or  not  one  of  its  formal  specifications. 

Definition  3  :  An  evaluation  process  is  a  validation  process  that  attempts  to  determine  whether  a 
KBS  satisfies  or  not  one  of  its  pseudo-formal  specifications. 

We  can  retain  that  the  most  fundamental  difference  between  a  verification  process  and  an  evaluation 
process  concerns  the  interpretation  of  the  result:  it  is  always  possible  to  conclude  whether  the  KBS  fits 
or  not  the  formal  specification,  as  well  as  the  corresponding  informal  specification  in  natural  language. 
However,  this  is  not  the  case  with  an  evaluation  process  because  we  cannot  conclude  without 
ambiguity  whether  the  KBS  satisfies  or  not  the  informal  specification  that  led  to  the  pseudo-formal 
specification. 

Tliis  paper  presents  an  expert  system  verification  approach  that  considers  system  specifications  to 
be  partially  described  when  development  starts.  This  partial  description  is  not  necessarily  perfect  and 
our  woilc  aims  at  using  Machine  Learning  techniques  to  progressively  improve  the  quality  of  expert 
system  Knowledge  Bases  (KB),  by  coping  with  two  major  KB  anom^ies:  incompleteness  and 
incorrectness.  By  integrating  Machine  Learning  techniques  in  the  validation  step  of  a  KBS 
evolutionary  life  cycle  model  (e.g..  Figure  1-b),  we  allow  experts  to  propose  an  initial  version  of  the 
KB  which  will  be  refined  and  corrected  throughout  a  refinement  cycle.  This  particular  point  permits  to 
deal  with  a  drawback  of  pure  inductive  learning  algorithms  (e.g.,  ID3  [2],  AQ  (3],  ...y.  induced  rules 
are  often  not  directly  usable  by  current  expert  system's  shell.  The  reason  is  that  such  rules  are 
generally  flat  and  so  do  not  permit  the  KBS  to  have  explanation  capabilities.  We  believe  that  starting 
with  an  initial  KB,  possibly  imperfect,  which  will  be  refmed  until  it  reaches  its  final  expression,  is  a 
better  way  than  that  using  directly  pure  inductive  learning.  Figure  2  presents  a  view  similar  to  that 
previously  presented,  except  that  here  we  deal  with  life  cycle  of  a  KB; 


Figure  2  ;  Life  cycle  of  a  KB 


In  our  approach,  we  consider  expert  systems  that  deal  with  KBs  expressed  in  different  formalisms 
(This  is  in  accordance  with  the  current  tendency).  Each  part  of  the  knowledge  is  represented  in  a 
particular  formalism  that  is  considered  as  the  most  appropriate,  relatively  to  the  type  of  knowledge  it 
expresses.  The  knowledge  we  consider  consists  of  three  parts  :  shallow  knowledge,  a  deeper  kind  of 
knowledge,  and  a  set  of  examples.  The  first  part  is  a  set  of  production  rules  expressed  in  First  Order 
Logic  (FOL);  the  second  part  consists  in  turn,  of  two  categories  of  knowledge:  Semantic  nets 
representing  entities  of  the  application  domain  and  their  relationships,  and  a  set  of  integrity 
constraints.  Lastly,  the  set  of  examples  contains  observations  represented  as  conjunctions  of  fust  order 
literals.  Each  observation  is  classified  as  belonging  to  a  given  concept. 

In  this  paper,  we  first  describe  previous  works  in  the  field  of  KBS  verification  and  then  compare 
them  to  our  method  and  we  highlight  the  strengths  and  weakness  of  each  approach.  We  then  present 
our  approach  and  illustrate  it  with  an  example.  In  this  example,  in  order  to  revise  the  KB,  we  have 
integrated  in  the  veriftcaiion  cycle  two  different  learning  algorithms,  FOIL  [4]  and  KBG  [S]. 


407 


2.  Related  works 

Earlier  work  in  ihe  field  of  verincation  of  KB  have  not  integrated  machine  learning  techniques  to 
revise  initial  formulation  of  KBs.  They  generally  consider  rule  bases  expressed  in  attribute-value  logic 
or  Hrst-order  logic.  CHECK  [6]  verifies  statically  consistency  and  completeness  of  KBs  expressed  in 
FOL.  This  method  is  typical  of  the  approaches  that  are  easy  to  implement.  Despite  of  the  simplicity 
of  the  basic  concepts  used  in  CHECK,  a  rule  base  may  present  incoherences  that  CHECK  could  not 
detect. 

To  solve  such  weakness,  new  approaches  referred  to  as  dynamic,  have  been  evolved.  These  ones 
uike  into  account  the  deductive  power  of  knowledge  bases.  These  methods  are  either  exhaustive,  in 
which  case  they  aim  at  finding  the  specification  of  all  incoherent  situations,  or  heuristic  and  so,  they 
exploit  heuristics  to  select  the  more  "interesting"  conjectures  of  incoherence.  The  exhaustive  approach 
is  used  by  systems  like  COVADIS  [7]  and  COCO  18].  Starting  with  incoherence  specification,  the 
issue  is  to  prove  that  these  specifications  are  teachable  from  sets  of  "incoherent  facts".  If  this  is  the 
case,  the  is  incoherent:  otherwise  it  is  coherent  A  typical  example  of  systems  adopting  a 
heuristical  approach,  is  the  SACCO  [9]  system.  This  approach  brings  computation  speed-up,  and 
avoids  proposing  to  the  expert  real  but  improbable  conjectures.  To  do  this,  SACCO  makes  use  of 
heuristics  that  allow  to  define  a  limited  set  of  potential  conjecUires  of  incoherence.  If  the  so  defined 
conjectures  are  verified  by  an  initial  fact  base,  then  the  incoherence  is  detected;  otherwise,  the  KB  is 
assumed  coherent. 

On  the  other  hand,  functional  verification  of  KBS  makes  sure  that  the  provided  results  are  in 
accordance  with  domain's  semantic.  For  instance,  from  an  expert  knowledge  and  a  set  of  cases,  the 
SEEK  (10]  system  exploits  a  rule  refinement  cycle,  which  by  p^ormance  evaluation  of  the  rules  on  a 
library  of  cases,  and  by  analyzing  the  statistical  behavior  of  each  rule,  suggests  modifications  to  be 
introduced  in  the  expert  knowledge. 

More  recently,  several  works  have  integrated  machine  learning  algorithms  in  order  to  revise 
automatically  imperfect  KBs.  They  generally  treat  KBs  expressed  in  the  form  of  production  rules. 
Some  systems  are  only  capable  of  generalizing  an  overly  specific  (incomplete)  KB  [1 1, 12, 13]  while 
others  are  only  capable  of  specializing  an  overly  general  KB  [14,  IS,  16]. 

Number  of  these  systems,  uses  the  Explanation-Based  Leaoiing  (EBL)  [17]  approach  to  deal  with 
imperfect  KBs.  For  instance,  to  deal  with  overly  general  KBs,  EBL/TS  [18]  uses  the  explanation  tree 
of  each  positive  example  and  then  replace  overly  general  definition  of  the  given  concept,  by  the  rule 
associate  with  this  explanation  tree.  On  the  other  hand.  A-EBL  [18]  treats  the  problem  of  multiple 
example  explaiutions.  It  throw  out  inconsistent  explanations  and  reuiin  only  a  minimal  set  of  "good" 
explanations.  This  is  done  by  using  heuristics.  Such  approaches  starts  with  an  imperfect  KB  expressed 
in  terms  of  rules,  and  replace  the  entire  KB  by  the  rule  associated  to  the  explanation  tree,  revising  the 
operational  definition  of  a  given  concept  These  processes  do  not  preserve  the  suructural  form  of  rule 
bases  and  produce  generally  flat  rules. 

However,  Ihe  current  tendency  in  knowledge  representation  aims  at  integrating  different  foimalisms 
such  rules,  frames,  semantic-nets,  etc.  The  goal  pursued  by  such  an  approach  is  the  increasing  of 
explanation  capabilities  of  KBSs  and  acquisition  of  different  kind  of  knowledge,  each  expressed  in  a 
particular  formalism,  considered  as  the  most  approtuiate,  relatively  to  the  type  of  knowledge  it 
expresses.  In  the  field  of  Knowledge  Verification,  there  still  is  few  systems  that  cope  with  imperfect 
KBs,  exfuessed  in  different  formalisms.  Our  approach  considers  KBs  expressed  in  different  formalisms 
as  :  rule  bases,  semantic-nets  and  integrity  constraints.  It  considers  that  using  machine  learning 
techniques,  in  an  evolutionary  life-cycle  can  propose  refinements  to  the  initial  KB.  until  it  reaches  a 
correct  and  complete  expression.  In  this  way.  we  deal  with  a  drawback  of  pure  inductive  learning 
algorithms;  indu^  rules  are  often  not  directly  usable  by  expert  system's  shells. 


3.  Description  of  the  approach 

Our  method  starts  with  the  hypothesis  that  the  initial  KB  provided  by  an  expert  is  probably 
incomplete  and/or  incorrect.  The  detection  of  incompleteness  and/or  incorreemess  is  centr^  on  the 
notion  of  label  of  a  given  concepu 


408 


Definition  4  :  The  label  of  a  concept  is  the  set  of  all  initial’  facts  (i.e.,  literals)  that  allows  a 
KBS  to  deduce^  the  concept  : 


^concept  =  Aj-tjn  Pij .  where  Fij  is  an  initial  first  order  literal. 

The  Knowledge  provided  to  the  system  includes: 

-  Rules  of  expertise:  (R;  /  R;  =  „  Pik)  =>  Q;  )  where  Pik  and  Q\  are  first  order  literals. 

-  A  semantic  net  describing  domain  entities  and  relations  between  entities. 

-  Integrity  constraints:  (Ik  /  Ik  °  incompatibility  (concept-i,  concept-J))  where  concept-i  and 
concept-j  ate  first  order  literals. 

-  A  set  of  examples :  (e  /  e  =  desc(e)  A  classfe))  where  desc(e)  =  Aj=i,  p  Lj  is  the  description  of 
the  example;  this  description  is  a  conjunction  of  initial  first  order  literals,  class(c)  is  a  first  order  literal 
that  indicates  the  concept  to  which  the  example  belongs. 

All  literals  are  typed.  This  means  that  arguments  of  a  given  predicate  takes  there  values  in  a  user 
defined  type.  Fbr  instance,  we  have  considered  the  following  types:  nominal,  linear,  integer,  real, 
hierarchic^. 

In  this  context,  an  incorrectness  corresponds  to  the  case  where  an  example  belongs  to  a  concept 
that  is  incompatible  with  its  real  concept.  Such  an  incompatibility  is  stated  by  the  meta-predicate 
incompatible  (concept-i,  concept-j).  According  to  our  definition,  an  incotrectaess  is  detected  if  the 
following  formal  specification  is  verified: 

(A)  3e:  desc(e)  -  p  Lj  &  ctass(e)  =  concept-:,  3  an  integrity  constraint  I  =  incompatible 
(concept-i,  concept-j),  3a  substitution  a  such  that  desc(e)  =>  (Econcept-j) a  • 

On  the  other  hand,  an  incompleteness  corresponds  to  the  case  where  an  example  does  not  belong  to 
its  real  concept  label.  The  formal  specification  assexiated  to  the  latter  informal  one  is  the  following: 

(B)  3  e:  desc(e)  =  Aj^i,  p  Lj  &  class(e)  =  concept-i,  such  that  there  is  no  substitution  a  so  that 
that  desc(e)  =>  {Econcept-Oa- 

As  stated  by  [19],  imperfections  of  the  KB  are  due  to  many  reasons.  In  our  case,  when  an 
incompleteness  is  detected,  the  revision  process  has  to  deal  with  many  cases.  It  can  induce  a  new  rule 
which  has  to  be  linked  up  in  an  appropriate  place  within  the  structural  representation  of  rules.  It  can 
also,  generalize  an  existing  rule  by  dropping  literalfs)  in  its  left-hand  side  or  by  learning  a  more 
general  literal.  In  the  case  of  incorrectness,  the  revision  process  aims  at  specializing  the  label.  This  is 
firstly  done  by  the  localization  of  the  faulty  rule(s)  and  subsequently  by  specializing  their  left-hand 
side.  This  specialization  is  done  by  learning  new  literals  or  by  specializing  an  existing  one.  A  rule  can 
also  be  suppressed  if  it  is  satisfied  exclusively  by  counter-examples  of  the  smdied  concept. 

Unlike  many  other  approaches,  the  revision  process  concerns  a  faulty  sub-concept.  A  sub-concept 
is  defined  by  a  non-inititti  literal  that  appears  in  the  definition  of  the  studied  concept.  Refinements 
proposed  by  the  learning  tool  are  performed  thanks  to  a  learning  algorithm,  which  considers  as  input 
data,  a  set  of  examples  depending  on  the  kind  of  detected  anomalies  (incorreemess  or  incompleteness). 
In  this  way,  the  learned  process  concerns  only  the  label  of  this  sub-concept,  without  modifying  the 
label  of  other  sub-concepts.  Therefore,  they  allow  us  to  preserve  the  structural  form  of  the  rule  base 
(i.e.,  the  shallow  knowledge).  In  the  case  of  a  detected  inconeemess,  the  revision  algorithm  works 
with  positive  examples  of  the  givn  concept  and  negative  ones  that  are  subsumed  by  the  faulty  concept 
label.  In  the  case  of  incompleteness,  the  considered  positive  examples  are  those  that  verify  the  formal 
specification  (B)  and  do  not  verify  the  actual  definition  of  the  faulty  sub-concept.  The  negative  ones  are 
those  subsumed  by  the  current  sub-concept  definition,  and  those  imposed  by  integrity  constraints.  We 


^  Initial  facts  are  used  to  describ  examples.  This  notion  is  close  to  the  notion  of  operationality  introduced  by  in 
EBL. 


^  The  considered  inference  engine  strategy  is  simple;  a  rule  is  fireablc  if  its  left-hand  side  is  verified.  If  more  than 
one  rule  is  fireable  at  a  moment,  the  inference  engine  considers  the  first  in  the  KB.  All  fireable  rules  ate  considered. 


409 


must  notice  that  for  each  example  in  the  leaming-sct.  a  new  description  called  “contextual  description" 
is  proposed.  It  contains  literals  that  have  a  semantic  closeness  with  the  faulty  concept.  They  arc 
suggested  by  the  semantic-i^t,  or  by  a  domain  expeii  The  intervention  of  an  expert  may  be  necessary 
if  it  turns  out  that  the  semantic-net  is  incomplete. 

This  revision  process  may  propose  some  modiHcaiions  in  the  deep  knowledge  base.  Such 
modifications  arise  when  the  initial  vocabulary  is  not  sufneient;  in  which  case,  the  semantic  net  has  to 
be  extended  with  an  entity  or  a  relation.  These  latter  correspond  respectively  to  a  predicate  argviiiient  or 
a  predicate.  The  growing  of  the  initial  vocabulary  is  made  easier  by  the  fact  that  the  semanuc  net  is 
organized  in  different  pieces,  each  piece  concerning  a  particular  sub-concept  (e.g.,  in  the  mammal 
example  this  will  correspond  to  particular  functionalities). 

We  can  summarize  our  verification  algorithm  as  follows: 

a)  Determine  the  label  Ec  of  a  target  concept. 

b)  Determine  all  examples  that  are  subsumed  by  the  current  label  Ec. 

c)  Identify  faulty  concepts. 

d)  For  each  faulty  concept, 

-  Build  the  leaning  set; 

•  For  each  example  in  the  learning  set,  propose  a  contextual  description; 

-  Learn  a  new  piece  of  knowledge; 

•  Revise  the  KB. 

Before  illustration  of  our  approach  on  a  particular  example,  we  summarize  the  entire  revision 
process  by  Figure  5 : 


Semantic  net  >  Integrity  coiutninu 


Complete  entities  and  retationships  description 


comparison  between  the  label 


^  Example-iet^ 


r 


(  {  ■■>  1  Determine 

->  -  '  m- 

I  y  the  label  of 


z 


f 

K.B  Revision  Tool 


t  Label  1 
a  given  concepi 


Rule  Base 


CofTcct  and  Complete  initial  formuladon  of  the  rule  base 


Figure  3  :  The  Revision  Process 


4.  An  example 

The  Knowledge  Base  presented  in  this  section  is  drawn  from  (20].  It  is  made  up  of  different  pieces 
of  knowledge,  each  piece  being  represented  in  a  particular  formalism  which  is  considered  as  the  most 
appropriate,  relatively  to  the  knowledge  it  expresses.  First,  a  rule  base  expressed  in  First  Order  Logic, 
that  contain  expert's  opinion  in  the  mammal  theory.  The  second  piece  of  knowledge  is  a  semantic-net 
describing  domain  entities  and  their  relationships.  It  is  a  deeper  kind  of  knowledge  than  expert's 
opinion.  Figure  7  shows  these  two  kinds  of  knowledge: 

manunal(x)  <-  blood-system(x,  mammal)  A  sexual-lifefx,  mammal)  A  locomotion(x,  mammal) 
blood-sysumfx.  mammal)  <--  blood(x,  y)  A  lemperalurefy,  hoi)  A  beart(x,  z)  A  chambers<t,  4) 
sexual-l^efx,  mammal)  <-  feriilizatiordx,  internal)  A  way-of-developix,  mammal) 
locomolionfx,  mammal)  <-  anle-limbs(x,  legs)  A  post-limbs(x,  legs)  A  movefx,  ground) 
way-of-develop(x,  mammal)  <~  developfx,  placenta)  A  reproductionfx,  viviparous)  A  has(x, 
breasts) 

bird(x)  <-  reproduction(x,  oviparous)  A  blood(x,  y)  A  ien^rature(y,  hot)  A  mouth  (x,  bill) 


410 


bmiy  tawifd  i 


behiirf  I  V  ^  locwifo 

numnul 

\ 


ptMt-linilM 


leti 

limM 


tempcntmi 


ciicuUtoiy-sytietn 
^  fxn-of 

btoorf  ^  I  Wff  ^ 
pump  chwnben 


h«^ 

i«i 


breuu 


moving-eavjronmeni , 


>  ground 


linked-io  develop  ^  pregnancy 
■  '  -i  placena^  mammal  m  [13  days,  640  days] 

leptoducsion  I 


vivipanwi 

Figure  4  ;  The  Associiied  Semantic-nei 


In  addition,  integrity  constraints  are  expressed  in  the  Tollowing  way ; 
incompaiibility  (mammal  (x),  bird  (xi) 

incompalibility  (way-of-develop  (x,  mammal),  way-of-develop  (x.  bird)) 

Finally  ,  we  have  a  set  of  classified  examples  of  mammals  and  birds.  Some  of  them  are  listed  in 
the  following  table: 


cat :  mammal(cat)  A  blood(cat,  b)  A  temperaturefb,  hot)  A  fertilizationfcat,  internal)  A 
developfcat.  placenta)  A  reproductionfeat,  viviparous)  A  has(caL,  breasts)  A  ante-limbs(cai,  legs)  A 
post-limbs(cat,  legs)  A  movc(cat,  ground)  A  hcart(cat.  h)  A  chambers(h,4). 

whale :  mammaJf whale)  A  blood( whale,  b)  A  temperaturefb,  hot)  A  fertilizationf whale,  internal) 
A  developfwhale,  placenta)  A  reproduciion(whate,  viviparous)  A  has( whale,  breasts)  A  anie-| 
limbsfwhsile,  fines)  A  post-limbs(whalc,  tail)  A  movefwhale,  water)  A  heart(whale,  h)  A  ch^bers(h,i 
4)  A  size(whale,  giant).  ' 

dolphin  :  mammal(dolphin)  A  blood(dolphin,  b)  A  temperaturefb,  hot)  A  fertilizationfdolphin.i 
internal)  A  developfdolphin,  placenta)  A  leproductionfdolphin,  viviparous)  A  has(dolphin,  breasts)  A  j 
ante-limbs(dolphin,  fines)  A  post-limbs(dolphin,  tail)  A  movefdolphin,  water)  A  hcartfdolphin,  h)  A 
chambersfh,  4)  A  behaviourfdolphin,  friendly). 

kangaroo  :  mammal(kangaroo)  A  blood(kangaroo,  b)  A  temperaturefb,  hot)  A 
ferti]ization(kangaroo,  internal)  A  develop(kangaroo,  marsupium)  A  reproduction(kangaroo, 
viviparous)  A  hasfkangaroo,  breasts)  A  ante-Umbs(kangaroo,  legs)  A  post-limbsfkangaroo.  legs)  A 
movefkangaroo,  ground)  A  heartfkangaroo,  b)  A  chambersfh,  4).  j 

bat  :  mammalfbat)  A  bloodfbat,  b)  A  temperaiurefb,  hot)  A  fenilizationfbat,  internal)  A 
develop(bat.  marsupium)  A  ieproduction(bat,  viviparous)  A  hasfbat,  breasts)  A  ante-limbs(bat,  wings) 
A  post-limbs(bai,  legs)  A  movefbat,  air)  A  heart(bat,  h)  A  chambersfh,  4). 

spiny-anualer :  mammal(spiny-ameater)  A  bloodfspiny-anteater,  b)  A  temperaturefb,  hot)  A 
fertilizationfspiny-anteater,  internal)  A  mouthfspir.y-antcatcr,  bill)  A  reproductionfspiny-anteater, 
oviparous)  A  hasfspiny'anteater,  breasts)  A  ante4imbs(spiny-anteater,  legs)  A  post-limbs($piny- 
ianteater,  legs)  A  move(spiny-anteater,  ground)  A  heart(spiny-anteater,  h)  A  chambersfh.  4). 

!  ornyihorynchus :  mammal(omithorynchus)  A  bloodfornythorynchus,  b)  A  temperaturefb,  hot)  A 
fertilizationfornythorynchus.  internal)  A  mouthfornythorynebus,  bill)  A  reproduction(omyihorynchus, 
joviparous)  A  has(ornythorynchus,  breasts)  A  ante-limbs(ornythorynchus,  legs)  A  post- 
|limbs(ornythorynchus,  legs)  A  move(ornythoryncbus,  ground)  A  heattfomythorynchus,  h)  A 
chambers(h,  4). 

duck :  birdfduck)  A  bloodfduck,  b)  A  temperaturefb,  hot)  A  feTtilizntion(duck,  internal)  A 
imoiith(duck,  bill)  A  reproduction(duck,  oviparous)  A  covered(duck,  feathers)  A  anie-limbs(duck, 

I  wings)  A  post-Umbs(du^,  legs)  A  move(duck,  air)  A  hean<duck,  h)  A  chambersfh,  4). 

crow :  birdferow)  A  blood(crow,  b)  A  temperaturefb,  hot)  A  fertilizationfcrow,  internal)  A 
mouth(crow,  bill)  A  reproduction(crow,  oviparous)  A  covered(crow,  feathers)  A  coior(feathers,  black) 
A  ante-)imbs(crow,  wings)  A  post-limbs(crow.  legs)  A  moveferow,  air)  A  heart(crow,  h)  A 
chambcrs(h,4). 

Table  1  ;  Exunples  of  menunaJi  end  birde 

As  mentioned  above,  our  approach  starts  by  determining  concept's  label.  For  instance,  if  we  want 
to  study  the  initial  mammal's  definition,  we  obtain  the  following  label: 


411 


^mammal  *  blood(x,  yi  A  temperaiure(y.  hot)  A  heart(x.  t)  A  chamberslt,  4)  A  feriilizaiion(x. 
internal)  A  develop(x,  placenta)  A  reproduclion(x,  viviparous)  4  has(x,  breasts)  A  ante-limbs(x,  legs) 
A  post-limbs(x,  legs)  A  move(x,  ground) 

We  can  notice  that  many  mammal  examples  are  not  covered  by  ibis  current  label  of  mammal 
concept.  For  instance.  3  e  :  class(e)  =  mammal  /  there  is  no  substitution  a  so  that: 
desc(e)=>{EmammaU a-  This  is  the  case  for  e  e  (whale,  dolphin,  kangaroo,  bat,  spiny-anteater, 
omithorynchus].  On  the  other  hand,  we  notice  that  examples  of  mammal,  that  is  a  concept 
incompatible  with  bird's  concept,  are  covered  by  the  present  bird  label,  i.e.,  3e  :  class(e)  =  concepi-j 
and  incompatible  (concept-j,  bird)  /  there  is  a  substitution  a  so  that  desc(e)=^  (EbirdJo-  The  concerned 
examples  are  the  following:  (spiny-anteater,  omithorynchus).  In  such  a  case,  both  incompleteness  and 
inconectness  ate  detected. 

To  learn  new  dermitions  of  faulty  concepts,  we  have  used  two  different  learning  algorithms  ;  FOIL 
and  KBG.  FOIL  learn  Horn  clause  from  examples  of  relations  (in  our  case,  relations  are  the  predicates 
of  the  domain  application).  The  target  relation  is  the  faulty  concept  to  revise.  Like  ID3,  this  algorithm 
employs  an  information-gain  estimate  to  select  the  best  literal.  On  the  other  hand,  KBG  is  a  learning 
algorithm  which  uses  a  similarity  measure  to  compute  a  distance  between  the  different  entities  of  a 
pair  of  examples. 

To  deal  with  these  anomalies,  the  revision  process  starts  by  identifying  faulty  concepts.  For 
instance.  It  finds  that  both  locomoiion(x,  mammal)  and  way-of-develop(x,  mammal)  are  sub-concepts 
which  need  to  be  completed.  To  complete  the  locomotion  definition,  the  revision  process  build  a 
learning  set,  containing  as  positive  examples,  all  mammal  examples  that  are  not  covered  by  the 
curient  locomotion  definition,  and  as  negative  ones,  those  that  are  covered  by  this  definition  and  those 
that  are  imposed  by  integrity  constraints.  We  obtain  the  following  learning  set; 

POS-{dolphin,  whale,  bat) 

NEG=(cat,  kangaroo,  spiny-anteater,  ornythorynchus) 

For  each  of  these  examples,  our  revision  process  builds  a  contextual  description.  It  contains  literals 
which  have  a  semantic  closeness  with  the  faulty  concept  For  the  previous  learning  set  we  obtain 
these  descriptions; 

(D((dolphin)=anie-limbs(dolphin,  fines)  A  post-limbs(dolphin,  tail)  A  move(dolphin,  water) 
(Djwhale)=ante-limbs(whale, fines)  A  post-limbs(whale,  tail)  A  mov^fw’uile, 
lD(ibal)=ante-limbs(bat,  wings)  A  post-limbs(bat,  legs)  A  movelbat,  air). 

‘Ddcai)=ante-liitd)s(cai,  legs)  A  post-limbs(cat,  legs)  A  move(cat,  ground) 
(Djkangaroo)=ante-limbs(kangaroo.  legs)  A  post-limbs(kangaroo.  legs)  A  move(kangaroo,  ground) 
(Dfjlspiny-anteater)sante-limbs(spiny-anteater,  legs)  A  post-limbs(spiny-anteater,  legs)  A 
movelspiny-anteater,  ground) 

(Ddlornythorynchus)=ante-limbs(ornythorynchus,  legs)  A  post-limbs(ornythorynchus.  legs)  A 
move(ornyihorynchus,  ground). 

From  both  FOIL  and  KBG,  we  obtain  the  following  new  piece  of  knowledge : 
locomotionlx,  mammal)  <--  ante-limbs(x,  wings)  V  ante-lind>s{x,  fines). 

This  new  expression  complete  the  mammal  label,  and  precisely  the  locomotion  one.  We  obtain 
then  the  following  new  definition; 

locomotionlx,  mammal)  <•■  [ante-limbs(x,  tegs)  A  posf-limbs(x,  tegs)  A  move(x,  ground)!  V 
anle-timbs(x,  wings)  V  ante-iimbs(x,  fines). 

In  the  same  time,  the  learning  algorithm  completes  the  "locomotion"  piece  of  the  semantic-net 
concerned  with  previous  modifications.  It  has  to  extend  its  initial  vocabulary; 


412 


Figure  S  :  Completion  of  the  "locomotion"  piece  in  the  seniuilic-net 

To  complete  ihe  solution  of  incompleteness  problem,  the  revision  process  considers  as  positive 
examples,  the  following  examples;  kangaroo,  bat,  spiny-anteater,  and  ornithorynchus,  that  are  not 
covered  by  the  current  deftnition  of  way-of-develop(x,  mammali.  Negative  ones  are;  cat.  whale, 
dolphin,  duck  and  crow.  We  notice  here  that  duck  and  crow  are  added  in  the  set  of  counter  examples, 
because  of  Ihe  integrity  constraint :  incompatibiUty(way-of-developlx.  mammal).  way-of-develop(x. 
bird)).  In  this  case,  we  obtain  the  following  context^  descriptions: 

2)c(kangaroo)=repfoduction(kangaroo,  viviparous)  A  has(kangaroo,  breasts)  A  dcvelop(lcangaroo. 
marsupium). 

2>c(bat)=icproduction(bat,  viviparous)  A  hasfbat,  breasts)  A  developfbat.  marsupium). 

2)c(spiny-anieater)sreproduction(spiny-anteater,  oviparous)  A  lias(spiny'anieaicr,  breasts). 

D£(omyihoiynchus)=reptoduction(omythorynchus,  oviparous)  A  hasfomyihoiynchus,  breasts). 

27c(cat)=ieptDduction(cat,  viviparous)  A  has(cat,  breasts)  A  devclc^cat,  placenta). 

2)£(dolphin)=reproduction(dotphin,  viviparous)  A  has(dolphin,  breasts)  A  devclop(dolphin, 
placenta). 

!Dc(whale)=repioduction(whale,  viviparous)  A  hasfwhale,  breasts)  A  develop(  whale,  placenta). 

27c(duck)=reproduction(duck,  oviparous)  A  covered(duck,  feathers). 

2?t<crow)=ieproduction(crow,  oviparous)  A  covetctKcrow,  fvaihers). 

From  this  learning  set,  FOIL  induces  an  other  definition  of  way-of-dcvelop(x,  mammal): 
[reproduction  (x,  oviparous)  A  has  (x,  breasts)/  V  develop(x,  marsupium).  On  the  other  hand,  KBG 
learns  the  following  definition;  /reproduction  (x.  oviparous)  A  has  (x.  breasts)/  V  [ develop! x. 
marsupium)  A  has(x,  breasts)  A  reproduction(x,  viviparous). 

Finally,  the  revision  process  completes  the  previous  way-of-develop(x.  mammal)  definition  as 
follows: 

way-cf-develop  [x,  mammal)  <--  [develop  (x,  development-organ)  A  reproduction  (x,  viviparous)  A 
has  (x,  breasts)/  V  [reproduction  (x.  oviparous)  A  has  (x,  breasts)/. 

As  in  the  previous  case,  we  complete  the  semantic  net  portion,  concerned  with  mammal  sexual 
life.  We  add  the  fact  that  both  marsupium  and  placenta  are  development-organ,  and  that  some 
mammals  are  oviparous; 


h»s 

Unked-io  develop  ^  pregnancy 
uterus  ^  placente  ^  mammal  Iv  {13  days,  640  days) 

devetoD  ra|>nM«cL>0B 

T  T 

devetopment-orgaB-^  —  marsupium  ^  viviparous 

oviparous 

Figure  6 :  Compleiion  of  the  "reproduction”  piece  in  the  semantic-net 


413 


To  solve  incorrectness  delected  thanks  to  bird's  label,  the  revision  process  selects  all  bird  examples 
and  the  two  mammal  examples  (i.e.,  spiny-anteater,  and  orniihorynchus),  which  have  help  us  to  detect 
such  inconectness.  We  obtain  die  following  learning  sec 

POS={duck,  crow} 

NEG^fspiny-anuaier.  ornythorynchus) 

(Di^duck)-covered(duck,  feathers)  A  ante-Umbs(duck.  wings)  A  move(duck,  air). 

‘D(fcrow)=covered(cTow.  feathers)  A  ante-limbs(crow.  wings)  A  move(crow.  air). 

(D(fspiny-anteater)=ante-limbs(spiny-anleaier.  legs)  A  tnove(spiny-anleater,  ground). 

(Dfiornythorynchus)=atue-Umbs(ornythorynchus,  legs)  A  move(ornylhcrynchus,  ground). 

The  literal  “covered  (z,  feathers)’  is  induced  by  FOIL.  KBG  learns  the  literal  "anie-liml>s(z. 
wings)’. 

Finally,  the  initial  bird  label  is  specialised,  and  we  may  obtain  these  two  definitions: 

bird  (x)  <—  reproduction  (x,  oviparous)  A  blood  (x,  y)  A  temperature  (y.  hot)  A  mouth  (x,  bill)  A 
covered  (x,  feathers). 

or, 

bird  (x)  <--  reproduction  (x.  oviparous)  A  blood  (x.  y)  A  temperature  (y.  hot)  A  mouth  (x.  bill)  A 
ante-limbs  (x.  wings). 

This  example  shows  us  how  our  approach  deal  with  incomplete  and/or  incorrect  KB.  It  considers  a 
kind  of  knowledge  easier  to  produce  by  a  domain  expert,  than  a  set  of  rules:  reliable  examples  and 
counter-examples  of  a  given  concept. 


5.  Conclusion 

In  this  research,  we  address  the  issue  of  KB's  verification.  These  KBs  consisting  of  two  levels  of 
knowledge  differently  structured.  This  paper  is  centred  around  the  following  statement:  integration  of 
learning  process  in  a  validation  step  of  KBS's  life  cycle,  allows  the  detection  and  correction  of 
anomalies  present  in  the  initial  formulation  of  the  KB. 

Our  approach  exploits  a  set  of  examples  that  experts  can  provide  more  easily,  than  a  knowledge 
directly  expressed  in  the  form  of  rules.  Thanks  to  the  notion  of  concept's  label,  this  approach  permits, 
as  a  first  step,  to  locate  the  level  at  which  incompleteness  or  incorrectness  in.cures.  A  subsequent  step, 
which  makes  use  of  the  concerned  examples,  calls  for  learning  techniques  that  perform  corrections  at  a 
level  indicated  by  the  incoherent  label.  In  parallel,  the  description  of  the  deep  KB  is  completed  to  take 
account  of  the  new  changes.  This  process  permits  to  start  with  a  initial  imperfect  KB,  and  then  to 
correct  and  complete  it  until  it  reaches  its  final  formulation. 

Currently,  we  are  exploring  the  possibility  of  learning  Integrity  Constraints.  We  aims  at  starting 
with  an  Integrity  Constraint  Ic  =  Incompatibility  (concept-i,  concept-j).  and  then  learn  from  the 
example  set,  incompatible  environment  expressed  in  terms  of  initial  facts  V  =  incompatibility 
g  Pj).  Such  a  process  could  reject  initial  example  description,  and  then  avoid  the  case  where  the 
learning  algorithm  has  to  deal  with  enoneous  example's  description. 


Acknowledgement 

I  would  like  to  thank  Yves  Kodratoff,  my  thesis  supervisor,  for  the  support  he  gave  to  this  work 

and  all  the  members  of  the  Inference  and  Learning  group  at  LRl. 


References 


1.  Laurent :  "Vers  une  lerminologie  validc  pour  Ic  domaine  dc  la  validation",  acics  dcs  JFVAV, 
pp  1-15,  Dourdan,  Avril  1992. 

2.  Quinlan  :  "Learning  Efficient  Classification  Procedures  and  their  Application  to  Chess  End 
Gaines",  in  Machine  Learning  :  An  Artincial  Intelligence  Approach,  R.S.  Michalski,  J.G.CarboncIl  & 
T.M.Mitchell  (^.),  Morgan  Kaufmann  1983,  pp  463-482. 

3.  Michalski ;  "A  Theory  and  a  Methodology  of  Inductive  Learning",  in  Machine  Learning  :  An 
Artincial  Intelligence  Approach,  R.S.  Michalski,  J.G.Cjlionell  &  T.M.Mitchell  (Eds.),  Morgan 
Kaufmann  1983,  pp  83-134. 

4.  Quinlan  :  "Learning  Logical  Dermitions  from  Relations",  in  Machine  Learning  Journal,  5,  pp 
239-266,  1990. 

5.  Bisson :  "Conceptual  Clustering  in  a  First-Order  Logic  Representation",  Proceeding  of  10th 
ECAI.  Vienna  1992. 

6.  Nguyen  &  al. ;  "Checking  an  Expert  System  Knoweldge  Base  for  consistency  and 
completeness",  UCAI  1985,  pp  215-319. 

7.  Rousset ;  "On  the  Consistency  of  Knowledge  Bases  :  The  COVADIS  System",  ECAI  1988, 
pp  79-84. 

8.  Loiseau  ;  "Validation,  acquisition  e(  mise  au  point  interactive  dcs  BC  ;  le  systime  COCO-X 
fondd  stir  la  coherence",  ch6se  de  doctoral,  univeristd  de  Paris-Sud,  1990. 

9.  Ayel ;  "Detection  d'incohdiences  dMS  les  bases  de  connaissances ;  SACCO",  ihfese  d'dutt, 
Chambery,  1987. 

10.  Politakis  &  al. :  "Using  Empirical  Analysis  to  Refine  E.S  Knowledge  Bases",  Artificial 
Intelligence  22,  pp  23-48, 1984. 

11.  Wilkins :  "Knowledge  base  reFinement  using  apprenticeship  learning  techniques.  In 
Proceedings  of  the  7th  National  Conference  on  Aniflcial  Intelligence,  pp  646-651,  St.  Paul.  MN, 
August  1988. 

12.  Danyluk :  "Finding  new  rules  for  incomplete  theories;  explicit  biases  for  induction  with 
contextual  information".  In  proceedings  of  the  6th  International  Workshop  on  Machine  Learning,  pp 
34-36,  Ithaca.  NY.  June  1989. 

13.  Whitehall :  "Knowledge-Based  Learning;  An  Integration  of  Deductive  and  Inductive  Learning 
for  Knowledge  Base  Completion",  PhD  thesis.  University  of  Illinois,  Urbana,  IL,  October  1990. 

14.  Flann  &  Dienerich ;  "A  study  of  explanation-based  methods  for  inductive  learning".  Machine 
Uaming,  4  (2).  pp  187-226, 1989. 

15.  Mooney  &  Ourston ;  "Induction  over  the  unexplained;  Integrated  learning  of  concepts  with 
box  explainable  and  conventional  aspects".  In  proceedings  of  the  6ih  International  Workshop  on 
Machine  Learning,  pp  5-7,  Ithaca,  NY,  June  1989. 

16.  Cohen  ;  "Learning  from  textbook  knowledge;  A  case  study.  In  proceedings  of  the  8th 
National  Conference  on  AitiFicial  Intelligence,  pp  743-748,  Boston,  MA.  July  1990. 

17.  Mitchell  &  al. ;  "EBL  ;  An  Unifying  View".  ML  Journal,  vol  1,  number  1,  Kluwer 
Academic  Publishers,  1986,  pp  47-80. 

18.  Cohen ;  "Abductive  Explanation-Based  Learning;  A  Solution  to  the  multiple  Inconsistent 
Explanation  Problem",  in  Machine  Learning  Journal,  Vol  8.  number  2,  March  1992. 

19.  Rajamoney  &  OeJong ;  "The  Classincation,  Detection  and  Handling  of  Imperfect  Theory 
Problems”,  UCAI  1987,  pp  205-207. 

20.  Matwin  &  Plante  ;  "A  Deductive-Inductive  Method  For  Theory  Revision".  IWML  1991,  pp 
160-174. 


Automatic  Theorem  Generation 
in  Plane  Geometry 


R.  Bagai,  V.  Shanbhogue,  J.  M.  Zytkow,  S.  C.  Chou 


Department  of  Computer  Science 
Wichita  State  University 
Wichita.  KS  67260-0083,  USA 

{hagai,  vasant,  zyikotu,  chou]  Qcs.txosu.edu 


Abstract.  We  introduce  a  conceptual  framework  for  discovery  of  theorems  in 
geometry  and  a  mechanism  which  systematically  discovers  such  theorems.  Our 
mechanism  incrementally  generates  geometrical  situations,  makes  conjectures 
about  them,  uses  a  geometry  theorem  prover  to  determine  the  consistency  of  sit¬ 
uations,  and  keeps  valid  conjectures  as  theorems.  We  define  geometry  situations, 
situation  descriptions,  theorems,  and  their  relationships  important  to  understand 
our  discovery  task.  An  exhaustive  generator  of  situation  descriptions  has  enormous 
combinatorial  complexity.  We  analyze  various  ways  to  reduce  that  complexity. 
Ideally,  the  generator  should  create  a  single  description  of  each  situation,  should 
generate  more  general  situations  before  more  specific  ones,  and  should  use  the 
previously  discovered  theorems  to  consuain  its  generation  mechanism.  We  de¬ 
scribe  our  generator  which  possesses  most  of  these  properties,  and  we  outline 
further  improvements.  Our  theorem  prover  is  based  on  Wu’s  algebraic  method  for 
proving  geometry  theorems.  We  discuss  the  interface  between  our  situation  gen¬ 
erator  and  theorem  prover  and  the  limitations  of  our  discovery  system.  Examples 
of  theorems  discovered  by  our  system  are  also  presented. 


1  Introduction 

For  over  2000  years,  plane  geometry  has  enjoyed  the  focus  of  mathematicians  who 
discovered  many  enormously  complex  and  elegant  theorems,  and  new  results  are  still 
being  discovered.  However,  the  research  strategies  employed  by  humans  to  arrive  at 
these  theorems  are  far  from  clear.  Moreover,  no  attempt  has  been  made  to  develop  a 
mechanism  that  can  systematically  discover  all,  oreven  a  well-defined  class  of  theorems. 
So  far,  the  only  substantial  approaches  to  automated  discovery  in  mathematics  were 
directed  at  the  theories  of  numbers  [4, 6, 9],  As  much  as  geometry  differs  from  theories 
of  numbers,  one  may  expect  another  discovery  mechanism  for  geometry. 

In  thi".  paper,  we  concentrate  on  automated  discovery  of  theorems  in  geometry. 
However,  if  all  statements  true  in  the  domain  of  geometry  are  understood  as  theorems, 
we  certainly  do  not  wish  to  discover  explicitly  all  of  them.  We  only  want  to  discover  a 
necessary  minimum  of  theorems  which  characterize  different  geometrical  situations.  We 
define  the  notion  of  a  situation,  and  the  necessary  and  sufficient  set  of  theorems  about 
situations.  This  leads  to  a  mechanism  which  systematically  formulates  conjectures  to 
cover  that  set  of  theorems. 


416 


Conjectures  must  be  proved  before  we  consider  them  theorems.  We  use  an  algebraic 
procedure  developed  by  Wu  [10,  1 1]  for  proving  conjectures  in  geometry.  Each  con¬ 
jecture  is  considered  by  our  theorem  prover  and,  if  proved,  becomes  a  theorem.  Wu’s 
technique  is  suitable  for  conjectures  in  which  the  assumption  and  the  conclusion  can  be 
expressed  as  conjunction  of  polynomial  equations  and/or  (heir  negations  (inequations). 
Conjectures  about  situations  constructed  by  our  mechanism  fall  into  this  category.  A 
prover  that  incorporates  Wu’s  technique  was  constructed  by  Chou  [1],  and  this  prover 
has  been  used  to  prove  hundreds  of  conjectures.  Many  of  these  conjectures  were  pre¬ 
viously  known  theorems,  but  some  were  genuine  guesses.  Examples  of  significant  new 
theorems  proved  by  this  prover  can  be  found  in  [2]. 

Ideally,  the  conjecture  generator  should  consider  each  situation  and  make  all  indepen¬ 
dent  conjectures  about  that  situation,  but  should  do  it  only  once.  It  should  also  generate 
simpler  situations  first,  so  that  each  theorem  can  be  detected  in  the  simplest  possible 
situation  to  which  it  belongs.  This  way  we  would  not  lose  any  theorems  nor  would  we 
repeat  any  effort.  We  present  a  computational  mechanism  which  builds  situations  from 
points,  lines,  and  a  few  relationships  among  these  objects.  Other  types  of  objects  and 
relationships  could  be  easily  included  into  that  mechanism.  We  show  how  the  verdict 
returned  by  the  prover  influences  further  generation  of  conjectures. 

Despite  the  early  lead  before  1980,  automated  discovery  in  mathematics  has  been  left 
behind  two  other  areas  of  machine  discovery —  scientific  discovery  [5, 8]  and  discovery 
in  databases  [7]  —  both  in  the  number  of  people  involved  and  the  scope  of  research. 
Lacking  efficient  theorem  proving  capabilities,  discovery  systems  in  mathematics,  such 
as  AM  [6]  and  CYRANO  [4],  only  make  conjectures  based  on  simple  induction.  Since 
efficient  theorem  provers  are  now  available  in  geometry  [  1  ],  we  can  use  them  to  advance 
machine  discovery  in  mathematics.  In  comparison  to  AM,  our  system  discovers  not 
merely  conjectures  which  have  been  validated  by  only  a  few  cases  in  an  infinite  domain, 
but  theorems,  which  have  been  validated  according  to  the  proof  standards  used  in 
mathematics.  Our  system  does  not  consider  empirical  evidence.  Complying  with  the 
rules  of  the  game  in  mathematics,  it  deals  with  ideal  geomeuy  in  which  situations  are 
described  by  mathematical  statements. 


2  Situations  and  Theorems 

Intuitively,  situations  are  “pictures”  that  can  be  drawn  on  the  plane  using  basic  geometric 
objects,  such  as  points,  lines,  circles  etc.,  arranged  in  certain  relationships  among  each 
other.  In  the  subsequent  discussion  and  in  our  current  system,  we  limit  ourselves  to  points 
and  straight  lines,  but  the  method  can  be  generalized  easily.  Because  we  concentrate  on 
geometry  as  a  branch  of  mathematics,  not  as  an  empirical  science,  we  do  not  want  to  study 
individual  pictures  and  their  empirical  properties.  Mathematical  geometry  describes 
pictures  by  geometry  statements,  and  draws  conclusions  from  those  statements.  In  such 
a  geometry,  a  class  of  pictures  satisfying  a  particular  description,  rather  than  an  individual 
picture,  is  a  meaningful  entity.  We  will  call  such  classes  situations.  Situations  consist 
of  objects  and  relationships  between  them.  All  objects  in  a  situation  are  assumed  to  be 
distinct. 

The  situations  are  described  by  statements  which  we  will  call  situation  descriptions. 


417 


Formally,  a  situation  description  consists  of  a  set  of  literals  (atomic  statenients  and  their 
negations)  in  a  given  vocabulary  of  functions  on  objects  and  relationships  among  objects. 
Descriptions  (and  the  corresponding  situations)  can  be  built  incrementally.  Consider  a 
simple  incremental  way  of  constructing  a  situation  leading  to  Euclid's  fifth  postulate; 
start  with  the  empty  picture  and  then  add  a  line.  Add  a  point  not  on  the  line.  Add  a 
second  line  through  that  point.  Request  that  the  second  line  be  parallel  to  the  first  line. 
Add  a  third  line  through  the  same  point,  but  different  from  the  second  line.  The  situation 
is  still  consistent  and  many  corresponding  pictures  can  be  drawn.  Then  request  that 
the  third  line  be  parallel  to  the  first  line.  No  picture  corresponds  to  this  situation.  The 
last  element  of  the  construction  introduced  an  inconsistency,  which  corresponds  to  the 
following  theorem; 

For  any  three  different  lines,  /i ,  I2.  and  fa.  and  a  point  p,  which  lies  on  I2,  and 
(3,  but  not  on  li,  if  I2  is  parallel  to  /]  then  I3  is  not  parallel  to  /]. 

The  same  theorem  can  also  be  stated  as; 

For  any  line  I  and  any  point  p  not  on  I,  there  is  a  unique  line  that  passes  through 
p  and  is  parallel  to  /. 

In  this  simple  example  we  can  see  that  adding  new  relationships  gradually  constrains 
the  situation  until  it  becomes  inconsistent.  If  the  situation  before  the  last  addition  was 
consistent,  while  after  addition  it  is  inconsistent,  we  get  a  theorem. 

Figure  1  shows  an  example  of  a  situation,  and  its  description  for  the  well-known 
Pappus’  theorem;  if  the  three  collinear  points  Bi  and  Cj  are  connected  as  shown 
to  the  three  collinear  points  A2,  B2  and  C'2,  then  the  resulting  intersection  points  A,  B 
and  C  are  collinear.  The  conclusion  of  the  Pappus’  theorem  on(.4,  line{B,  C))  follows 
from  that  description. 

Function  symbols,  such  as  line,  are  used  to  denote  objects  defined  by  primitive  objects. 
For  example,  the  term  line{Bi ,  Ci )  denotes  the  line  joining  the  points  Bi  and  C\ .  This 
works  whenever  a  defined  object  can  be  proved  to  exist  uniquely.  Moreover,  inclusion 
of  circles,  for  instance,  would  involve  additional  relationships,  such  as  center(P,  C) 
or  tangent(£,  C)  for  describing,  respectively,  that  point  P  is  the  center  of  circle  C  or 
that  line  L  and  circle  C  are  tangential  to  each  other. 

In  this  paper  and  in  our  discovery  system,  we  use  the  following  vocabulary  of  function 
and  predicate  symbols; 

Function; 

line{x,  y)  denotes  the  line  passing  through  points  x  and  y 

Predicates; 

on(x,l)  point  x  is  on  line  / 

parallel(/,  m)  lines  I  and  m  are  parallel  and  distinct 

intersect(/,  m)  lines  /  and  m  intersect  and  are  distinct 

For  instance,  the  atomic  statement  on(Ai ,  line{Bi,  Cj ))  states  that  the  point  Ai  lies  on 
the  line  joining  Bi  and  C\ .  Additional  predicates  can  be  easily  added. 

Definition!.  A  situation  description  (or  just  description)  is  a  pair  (7,  R),  where 
I  is  some  set  of  ingredient  points  and  7Z  is  a  set  of  atomic  statements  and/or  negations 
of  such  statements  about  points  in  I  that  can  be  made  in  the  vocabulary. 


418 


Ingredient  Points: 

A,  B,  C,  Ai,  Di,  C\,  .4.> ,  Z?2 ,  Cj 
Relationships; 

oa(Ai,  line{B\,Ci)) 
on(,42,  line{B2,  C2)) 
not  on(Ai , /ine( 52,  Ci)) 
not  on(Bi , /ine( 52.  C2 )) 
not  on(Ci . /ine{  52 .  C2)) 
not  on(.42. /i«e(5i .  Cl )) 
not  on(52.  itTic{ 5] .  Cl )) 
not  on(C2,  h>ie(5! .  Cl )) 
on(A,  line(A\ ,  52)) 
on(A,  line(A2,  Bi)) 
on(5,  /ine(  Ai ,  C2 )) 
on{5,  line(A2,Ci)) 
on(C,  line(Bi ,  C2)) 
on(C,  line(B2,  Ci )) 


Fig.  1.  A  situation  audits  description 


Situation  descriptions  can  be  either  consistent  or  inconsistent.  Situatio^is  correspond 
to  consistent  descriptions,  while  inconsistent  descriptions  do  not  describe  any  situations. 

Detection  of  Theorems 

Pappus’  theorem  holds  in  the  situation  described  in  Figure  I .  The  description  in  Figure 
1  includes  only  the  hypothesis  (the  “if’  part)  of  the  theorem  and  does  not  contain  the 
conclusion  that  the  points  A,  B  and  C  are  collinear.  If  the  description  in  Figure  1  is 
augmented  by  the  atomic  statement: 

not  on( A,  hnc(B,C)) 

then  the  resulting  description  is  inconsistent,  i.e.  it  corresponds  to  no  picture,  or  equiv¬ 
alently,  no  coordinates  could  be  assigned  to  the  ingredient  points.  This  suggests  the 
following  strategy  to  anive  at  theorems.  We  start  with  the  simplest  situation  description 
containing  the  empty  sets  of  ingredient  points  and  relationships,  and  keep  expanding  it 
by  adding  new  points  and/or  relationships.  Each  description  is  tested  for  consistency, 
and  new  relationships  are  added  until  the  description  becomes  inconsistent.  Such  a 
transition  from  a  consistent  to  an  inconsistent  description  results  in  a  theorem.  The 
consistent  description  forms  the  hypotheses  of  the  theorem,  and  the  negation  of  the  new 
relationship  forms  its  conclusion.  Formally,  if  5  is  a  consistent  description  and  A  is  a 
statement  such  that  S  U  { A}  is  inconsistent,  then  we  conclude  the  theorem; 

5  -+  -'A. 

We  do  not  expand  inconsistent  descriptions  any  further,  because  we  would  obtain  large 
numbers  of  trivial  theorems.  Situations  can  be  expanded  in  various  directions,  so  this 


419 


discovery  method  produces  a  tree  of  descriptions.  Interesting  theorems  lie  at  the  leaves 
of  that  tree. 

The  description  in  Figure  1  can  be  expanded  in  many  ways,  for  instance  by  addition 
of  a  new  point  D  or  addition  of  the  relationship  par  allel(  lme(A  i ,  Bi ) ,  line{A2 ,  B2 ) ) . 
Some  expansions  do  not  result  in  an  inconsistent  description.  However,  an  expansion 
obtained  by  adding  not  on(  A,  line(B,  C))  results  in  an  inconsistent  description  leading 
to  Pappus’  theorem. 

3  Description  Generator 

In  this  section  we  discuss  properties  desirable  for  an  efficient  situation  description 
generator. 

For  any  natural  n,  the  number  of  descriptions  containing  exactly  n  distinct  points 
Pi,  is  finite.  However,  many  descriptions  describe  the  same  situation.  For  example, 
for  three  ingredient  points  pi ,  P2  and  ps ,  the  descriptions 

{{Pl,P2,P3},{on(Pl>^WP2,P3))}) 

and 

<  {p  U  P2 1 P3  } ,  {  on{p2  Jtneipupi))}} 
represent  the  same  geometric  situation. 

Definition 2.  [Descriptions  S  and  T  are  isomorphic  if  S  can  be  transformed  into  T 
by  renaming  the  ingredient  points. 

Since  the  number  of  descriptions  that  ate  isomorphic  to  any  description  explodes  as 
factorial  (n!)  with  the  number  of  ingredient  points,  and  all  isomorphic  descriptions  are 
equivalent,  it  is  highly  desirable  to  limit  the  production  of  isomorphic  descriptions.  We 
do  not  want  our  system  to  spend  time  proving  equivalent  theorems,  nor  do  we  want  to 
present  them  as  distinct  theorems.  This  leads  to  the  following  property: 

Property  3 .  Exactly  one  description  in  each  isomorphism  class  should  be  generated. 


As  a  description  is  essentially  a  conjunction  of  atomic  statements,  the  notion  of  impli¬ 
cation  between  descriptions  can  be  defined  as  follows: 

Definition  4.  A  description  5  implies  description  T  if  the  relationships  in  5  imply 
those  in  T. 

Intuitively,  if  S  implies  T,  T  describes  a  simpler  (more  general)  setting  than  S.  Such 
T  should  be  generated  before  5,  so  that  all  theorems  about  T  are  discovered  before 
theorems  about  S.  Theorems  which  hold  for  T  hold  also  for  S,  but  according  to  the 
rules  of  the  game  in  mathematics,  need  not  be  stated  about  5  because  they  include  some 
unnecessary  assumptions.  This  leads  to  the  following  property: 

Property  5.  Let  {5i, S2, . . .)  be  the  order  in  which  descriptions  are  generated  by 
the  generator.  If  Si  implies  Sj  then  we  should  have  j  <i. 

'  n  the  next  section  we  will  present  a  generation  method  to  approximate  Properties  3 
and  5. 


420 


4  A  Discovery  Algorithm 

The  set  of  all  situation  descriptions  is  recursive  and  a  systematic  generation  of  all 
descriptions  can  be  easily  accomplished  by  starting  from  simple  descriptions  and  making 
them  ‘grow’.  Descriptions  can  be  grown  in  two  orthogonal  directions;  by  imposing  more 
relationships  among  its  objects,  or  by  adding  more  objects.  Only  the  steps  in  the  former 
direction  can  introduce  inconsistency. 

In  addition  to  an  easy  implementation,  incremental  generation  of  descriptions  has 
other  advantages.  First,  it  enables  picking  out  the  latest  relationship  added  to  a  descrip¬ 
tion,  if  it  rendered  the  description  inconsistent,  as  the  negation  of  the  conclusion  of  the 
theorem.  All  other  relationships  are  the  hypotheses  of  the  theorem.  Second,  it  helps  to 
enforce  Property  5  defined  in  the  previous  section.  Third,  it  is  necessary  for  discovery  of 
a  theorem  in  the  simplest  situation,  without  any  extraneous  objects  and/or  relationships. 
Further,  it  is  easy  to  avoid  proposing  the  same,  already  proven  conclusion,  in  more  com¬ 
plex  settings,  for  which  it  wc.ild  make  a  theorem,  but  would  include  some  unnecessary 
assumptions.  Fourth,  the  growth  of  an  inconsistent  description  can  be  terminated  right 
after  it  has  been  detected,  to  avoid  trivial  theorems.  Fifth,  it  is  an  approximation  of  the 
way  a  human  discoverer  would  proceed  to  uncover  theorems. 

Let  A  denote  the  set  of  all  situation  descriptions.  We  define  a  binary  relation  -<  on 
as  follows; 

DefiinitionG.  -<  (/2,fl2)  iff  one  of  the  following  is  true; 

1.  /2  =  /i  U  {P}  and  P2  =  Pi ,  for  some  point  P  not  already  in  I\ ; 

2.  I2  =  Ii  and  P2  =  Pi  U  {A},  for  some  atomic  relationship  A  not  already  in  Pi . 

Let  -<■'■  be  the  transitive  closure  of  the  above  relation.  Adding  more  points  or  atomic 
relationships  to  any  description  S  will  yield  another  description  T  such  that  S  T.  It 
is  easy  to  see  that  is  a  (irreflexive)  partial  order  on  A,  with  least  element  So  =  (0,0). 
The  following  proposition  is  straightforward; 

Proposition  7.  For  any  binary  relation  R  on  A,  i/ R"*^  =  then  -<  C  R. 

In  other  words,  -<  is  the  smallest  relation  with  transitive  closure  -<^ .  Thus  it  partitions  A 
into  a  sequence  of  disjoint  layers  Lg,  Li ,  L2,  •  •  •  as  shown  in  Figure  2,  where  any  layer 
Lk  consists  of  all  those  descriptions  that  are  at  distance  k  from  So .  (The  distance  of  any 
description  S  =  (I,  R)  from  So  is  1/|  -1-  |P|,)  A  breadth-first  algorithm  tjiat  generates 
descriptions  from  layer  Lk  only  after  all  descriptions  in  layers  lower  than  Lk  have  been 
generated,  will  guarantee  globally  that  no  smaller  description  is  generated  after  a  bigger 
one.  That  is,  if  (Si,  S2, . . .)  is  the  sequence  of  descriptions  generated,  then  Si  Sj 
implies  i  <  j. 


Interface  with  the  Prover 

The  interface  between  the  discoverer  and  the  prover  can  be  described  by  a  simple 
dataflow  network  shown  in  Figure  3.  In  the  overall  organization  of  the  discoverer,  the 


■£>0  L\  L'l  Z.3 

Fig.  2  .  The  layers  of  situation  descriptions 


Description  S 

Description 

Generator 

Prover 

“5  is  Consistent/lnconsistent” 

Fig.  3.  Interface  between  description  generator  and  theorem  prover 


prover  is  called  at  those  nodes  of  the  situation  description  graph,  when  a  new  relationship 
has  been  added  and  it  is  not  known  whether  the  resultant  description  is  consistent. 

The  description  generator  passes  on  the  descriptions  to  the  prover,  which  recognizes 
them  as  either  consistent  or  inconsistent.  If  the  prover  recognizes  a  description  S  as 
consistent,  the  generator  creates  a  theorem. 

Let  us  consider  a  simple  discovery  algorithm: 

Procedure  DISCOVER. 

1  Create  a  queue  Q,  consisting  solely  of  the  description  So  =  (0, 0). 

2  LOOP:  Remove  the  front  description  from  Q.  Call  this  description  S. 

3  Create  all  successors  of  S  with  respect  to  the  -<  relation. 

Insert  each  successor  into  Q,  if  not  already  there. 

4  If  S  is  already  tagged  ‘inconsistent’,  assign  tag  ‘inconsistent’  to  each 

successor  of  S  in  Q. 

5  Else  use  the  prover  to  test  consistency  of  5. 

If  5  is  inconsistent,  output  the  theorem  corresponding  to  S 

and  assign  tag  ‘inconsistent’  to  each  successor  of  S  in  Q. 

6  Go  LOOP. 

This  algorithm  checks  “sameness”  of  descriptions  and  produces  a  graph  rather  than  c 
tree.  Observe  that  the  algorithm  expands  inconsistent  descriptions,  though  no  expansion 
of  an  inconsistent  description  is  ever  submitted  to  the  prover.  Such  an  expansion  is 
useful  because  descriptions  have  more  than  one  parent.  As  long  as  even  one  parent  of 
5  is  inconsistent,  5  is  labeled  inconsistent  and  is  not  considered  by  the  prover. 


422 


Handling  Isomorphism 

As  discussed  in  the  previous  section,  eacli  description  has  a  large  number  of  isomorphic 
versions.  It  is  wasteful  to  invoke  the  prover  for  more  than  one  description  in  any 
isomorphism  class.  One  way  to  avoid  this  is  to  create  all  isomorphic  descriptions  at 
the  time  when  the  first  description  in  a  given  isomorphism  class  is  generated,  and  to 
store  all  isomorphic  equivalents  in  the  hashtable,  tagging  them  at  the  same  time  as 
inconsistent  or  as  consistent.  Though  this  mechanism  avoids  unnecessary  invocations 
of  the  prover,  creation  of  all  isomorphic  equivalents  for  every  description  is  not  very 
attractive  either,  and  leads  to  high  storage  complexity.  A  better  strategy  for  handling 
isomorphism  is  based  upon  a  canonical  order  in  which  relationships  are  added  to  the 
situation  descriptions,  so  that  only  one  situation  description  in  each  isomorphism  class 
is  ever  created,  but  we  have  not  yet  implemented  this  solution. 

Handling  the  Implication  Requirement 

While  the  generation  strategy  outlined  above  has  been  easy  to  implement,  the  DIS¬ 
COVER  algorithm  only  approximates  Property  5,  defined  in  the  previous  section, 
which  requires  that  if  Si  is  implied  by  Sj  (Sj  is  more  specific  than  Si)  then  5,  is 
generated  before  Sj.  A  violation  is  illustrated  by  the  following  example; 

S  =  {{A,  B,  C,  D},  {parallel(iwe(A,  23),  hne(C.  £>))}) 
r=  ({C/,V,lT,X,r},{not  on{U,lineiV,W))}) 

T  is  implied  by  S  but  T  /  S,  because  it  includes  more  points.  Since  S  is  in  the  layer 
Ls  and  T  is  in  the  layer  Le,  the  algorithm  generates  S  before  T. 

5  Example  Theorems 

In  this  section  we  present  some  simple  examples  of  theorems  discovered  by  an  imple¬ 
mentation  of  our  method. 

Example  1.  Let  ABCD  be  a  parallelogram.  Then  the  diagonals  AC  and  BD  intersect. 
This  theorem  was  discovered  from  the  following  4-point  description.  Since  this 


Ingredient  Points; 

A,  B,  C,  D. 

Relationships; 

parallel(/ine(A,  B),  line{C,  D)) 
parallel(/tne(A,  D),  line(B,  C)) 
parallel({me(  A,  C),  line(B,  D)) 


description  was  tagged  ‘inconsistent’  by  the  prover,  the  example  theorem  has  been 
formed. 


423 


Example 2  (Euclid’s  5th  postulate).  Let  I  be  a  line  and  P  any  point  not  on  1.  Then 
there  is  a  unique  line  passing  through  P  that  is  parallel  to  I 

The  description  corresponding  to  this  theorem  is; 


A  B 


Ingredient  Points; 

T,  B,  P,  Q,  R. 

Relationships; 

not  on(y?,  line(P.  Q]) 
parallel(line(P,  Q).  line(A,  D)) 
parallel(Ane(P,  B),  lineiA.  B)) 


6  Limitation  of  the  Prover 

The  underlying  prover  is  based  upon  Wu’s  method  [10,  11,  3],  which  takes  any  set 
of  polynomial  equations  and  inequations  as  input  and  determines  whether  or  not  they 
have  a  complex  solution  (a  solution  in  complex  numbers).  Before  a  description  S  is 
presented  to  the  prover  it  is  transformed  into  a  collection  C  of  polynomial  equations  and 
inequations.  If  C  is  inconsistent,  i.e.  if  it  has  no  complex  solution,  then  it  has  no  solution 
in  real  numbers  either,  and  S  is  inconsistent  as  well.  However,  if  C  is  consistent,  Wu’s 
method  says  nothing  about  the  existence  of  a  teal  solution  for  C,  so  that  5  may  have  no 
model  in  plane  geometry.  Our  system  treats  5  as  consistent  and  when  no  real  solution 
exists,  it  misses  the  corresponding  theorem  in  the  process.  For  this  reason,  our  method 
is  sound  but  not  complete  for  plane  geometry.  It  is  complete  only  for  metric  geometry 
introduced  by  Wu  (see  [2]). 

7  Conclusions  and  Future  Work 

We  have  presented  a  mechanism  which  incrementally  generates  geometric  situation 
descriptions,  tests  them  for  consistency,  and  generates  a  theorem  for  each  inconsistent 
description.  Theorems  generated  by  our  method  do  not  include  redundant  conditions  in 
their  “if’  parts.  Our  method  also  avoids  theorems  which  describe  isomorphic  situations. 
This  way  we  not  only  concentrate  on  interesting  and  non-redundant  theorems,  but  also 
we  cut  down  the  combinatorial  complexity  of  the  task,  so  that  our  system  can  reach 
more  complex  theorems  within  the  same  computational  resources.  Though  our  simple 
technique  does  not  avoid  generation  of  isomorphic  descriptions,  a  more  efficient  strategy 
to  deal  with  this  problem  is  possible.  This  strategy,  based  on  incremental  addition  of 
relationships  in  a  specific  canonical  order  is  left  for  future  implementation. 

The  current  theorem  discovery  method  does  not  use  the  theorems  which  have  been 
discovered  to  alter  its  situation  generation  strategy.  Our  significantly  more  ambitious 
goal  is  to  develop  a  mechanism  which  uses  the  theorems  to  limit  the  situation  generation. 


424 


Such  a  method  would  begin  with  no  knowledge  and  leant  from  its  own  discoveries.  For 
example,  discove  r  of  the  simple  theorem  "oniAMutiA,  B))"  implies  that  no  statement 
of  this  type  should  be  used  to  augment  any  situation  description.  Similarly,  the  discovery 
“If  on(A,Jtne(B,  C))  then  on(BMne(A,  C))"  can  be  used  to  avoid  generation  of  many 
descriptions. 

Currently,  our  system  supports  the  function  line  and  three  predicates:  on,  parallel, 
and  intersect,  and  can  discover  theorems  involving  points  and  lines  in  these  rela¬ 
tionships.  Expanding  the  system  to  handle  objects  such  as  circles  and  relations  between 
lines,  points,  and  circles  is  easy  and  will  lead  to  further  theorems. 

References 

1.  S.  C.  Chou.  Proving  elementary  geometry  theorems  using  Wu’s  algorithm.  Con¬ 
temporary  Mathematics,  29:243-286, 1984. 

2.  S.  C.  Chou.  Mechanical  Theorem  Proving.  D.  Reidel  Publisiiing  Company, 
Dordrecht,  Netherlands,  1988. 

3.  S.  C.  Chou  and  X.  S.  Gao.  Ritt-Wu’s  decomposition  algorithm  and  geometry 
theorem  proving.  In  Proceedings  of  CADE-10,  1990.  Also  in  LNCS,  Vol.  449, 
pages  207-220,  Springer- Verlag,  1990. 

4.  K.  B.  Haase.  Cyrano-3:  an  experiment  in  representational  invention.  In  J.  M. 
Zytkow,  editor.  Proceedings  of  the  ML-92  Workshop  on  Machine  Discouery. 
Aberdeen,  U.K.,  pages  153-160, 1992. 

5.  P.  Langley,  H.  A.  Simon,  G.  L.  Bradshaw,  and  J.  M.  Zytkow.  Scientific  discovery: 
Computational  explorations  of  the  creative  processes.  MIT  Press,  Cambridge, 
MA,  1987. 

6.  D.  B.  Lenat.  Automated  theory  formation  in  mathematics.  Contemporary  Math¬ 
ematics,  29:287-314, 1984. 

7.  Piatetsky-Shapiro  and  Frawley  (Eds.).  Knowledge  Discovery  in  Databases. 
AAAI  Press,  Menlo  Park,  CA,  1991. 

8.  Shrager  and  Langley  (Eds.).  Computational  Models  of  Scientific  Discovery  and 
Theory  Formation.  Morgan  Kaufmann  Publishers,  San  Mateo,  CA,  1990. 

9.  M.  H.  Sims.  Empirical  and  analytic  discovery  in  IL.  In  Proceedings  of  the  4lh 
International  Workshop  on  Machine  Learning,  Irvine,  1987. 

10.  Wen-Tsun  Wu.  On  the  decision  problem  and  the  mechanization  of  theorem  proving 
in  elementary  geometry.  Scientia  Smica,  21:157-179, 1978. 

11.  Wen-Tsun  Wu.  Basic  principles  of  mechanical  theorem  proving  in  geometries. 
Journal  of  System  Sciences  and  Mathematical  Sciences,  4(3):207-235, 1984. 


Learning  Simple  Recursive  Theories 


A.  Giofxiana.  L.  Sahta  and  C.  Baroglio 

Universit3i  di  Torino,  Dipaitimento  di  Infoimatica, 
Corso  Svizzera  185, 10149  TORINO  (Italy) 


Abstract.  The  task  of  learning  relations  has  been  concerned,  so  far,  with  the 
acquisition  of  intensional  descriptions  of  unrelated  concepts.  However,  in  many  real 
domains  concepts  are  strictly  related  to  each  other  and  the  instances  of  one  of  them 
cannot  possibly  be  recognised  without  previous  recognition  of  other  objects  as 
instances  of  related  concepts.  A  typical  case  is  the  problem  of  labelling  parts  of  a 
scene  in  order  to  interpret  it.  This  paper  extends  in  several  ways  the  learning 
relations  paradigm;  in  particular,  a  new  methodology,  allowing  a  recursive  theory  to 
be  inferred  from  a  set  of  examples,  is  presented.  The  learning  algorithm  works 
bottom-up,  creating  first  an  acyclic  graph  that  classifies  all  the  instances  in  the 
training  set.  Afterwards,  a  recursive  theory  is  synthesised  from  the  graph. 


1  Introduction 

In  this  paper,  an  algorithm,  called  RTL  (Recursive  Theory  Learner),  for  learning 
recursive  function-free  Horn  theories,  is  presented.  The  first  algorithm  in  the  literature 
for  inducing  logical  programs  is  due  to  Shapiro  (I  J,  followed  later  on  by  others,  such  os 
FOIL  [2J  and  CIGOL  (3]. 

However,  when  several,  mutually  dependent  target  relations  are  to  be  learned,  these 
approaches  suffer  from  the  drawback  of  an  excessive  complexity.  Let  us  consider,  for 
instance,  the  following  theory  defining  a  recursive  relation  for  hjfx): 

cf  h,(y)  A  <p(x,y)  h,{x).  Cj:  ^(x,y)  -t  h,(x)  (1.1) 

Suppose  to  apply  FOIL’S  learning  technique  to  generate  the  two  clauses  Ci  and  C2  from 
a  set  of  ground  instances.  Clause  Cz  will  be  learned  first,  and,  then,  using  C2  as  an 
operational  definition  for  h|(x),  it  will  be  possible  to  learn  clause  C|.  Clause  C|  is 
recursively  applied  until  no  new  positive  instances  of  h|  can  be  covered.  Suppose, 
moreover,  that  clauses  C|  and  C2  together  do  not  exhaust  the  relation  h]  and  that  a  new 
clause  needs  now  to  be  learned:  C3:  '|f(x,y)  ->  hi(x).  Adding  Cj  to  the  set  of  clauses 
defining  h,  would  require  the  extensional  test  for  consistency  of  clause  c,,  because  now 
C)  will  also  involve  y(x,y).  This  leads  to  a  combinatorial  explosion  of  extensional 
testing. 

In  the  case  of  (he  definition  of  a  single  predicate  h|(x),  the  complexity  of  the  above 
process  can  still  be  managed.  However,  it  becomes  really  unmanageable  when  mutual 
recursion  occurs.  Then,  we  are  proposing  another  method  that  does  not  suffer  from  this 
drawback.  The  method  bears  some  similarities  to  the  ones  proposed  by  Shapiro  [1]  and 
by  Bierman  [4]  but  introduces  some  important  novelties  with  respect  to  the  search 
heuristics  which  allows  the  complexity  to  be  dramatically  reduced.  Algorithm  RTL  is 
based  on  a  two  phases  learning  procedure.  In  the  Brst  phase  non-recursive  definitions 


*  This  work  has  been  done  in  the  framework  of  the  ESPRIT  Basic  Research  Action  N. 
7274  founded  by  EEC. 


426 


are  learned  by  induction.  In  the  second  one,  the  learned  theory  is  generalised  by 
introducing  recursive  definitions  in  such  a  way  that  non-monotonicity  problems  do  not 
arise.  The  algorithm  is  restricted  to  learn  recursive  theories  that  can  be  obtained  by 
using  only  a  kind  of  recursive  generalisation,  which  preserves  correemess  with  respect 
to  the  learning  set.  Nevertheless,  this  kind  of  theories  is  still  meaningful  for  real-world 
applications.  Finally,  the  method  is  extended  to  learn  recursive  programs  stratified  with 
respect  to  negation  (5],  which  have  an  acyclic  ground  graph  over  the  learning  set. 

2  Basic  Notions  and  Properties 

In  this  section  some  basic  notions  and  properties  will  be  introduced  which  are  exploited 
by  RTL  in  order  to  apply  monotonic  generalising  transformations.  Given  a  non¬ 
recursive  theory  T,  we  will  investigate  what  kinds  of  transformations,  introducing 
recursion  while  preserving  the  correctness  of  T  with  respect  to  the  learning  set  F,  can 
be  applied  to  T.  In  the  following,  x  and  y  denote  sets  of  variables. 

Definition  1:  Given  two  relations  R,  R*  of  the  same  arity  k,  corresponding  to  the 
predicates  h  and  h',  respectively,  R’  will  be  said  a  subrelation  of  R  iff  the  clause  h’  -»  h 
is  true.  □ 

Definition  1  implies  that  R'  c  R. 

Definition  2:  Given  a  pair  <h,  k>  of  predicates,  corresponding  to  subrclations  of  R,  a 
clause  c:  h  A  <p  k,  where  h  and  k  have  different  variables  and  cp  is  a  conjunctive 
formula  not  mentioning  any  subrelation  of  R,  will  be  said  a  simple  clause  over  h.  □ 

Definition  3:  Given  a  predicate  k,  corresponding  to  a  subrelation  of  R,  a  clause  c:  <p  -> 
k  will  be  said  a  primitive  definition  of  k,  iff  ip  is  a  conjunctive  formula  not  mentioning 
any  subrelation  of  R.  □ 

Definition  4:  Two  clauses  Ci  and  C2  are  said  correspondent  with  respect  to  a  relation  R, 
if  they  can  be  made  identical  by  properly  renaming  the  predicates  corresponding  to 
subrelations  of  R,  occurring  in  them.  □ 

As  an  example,  the  two  clauses:  h(y)  a  <p(x,y)  -♦  k(x)  and  h'(y)  a  cp(x,y)  -»  k(x),  where 
h  and  h'  denote  subrelations  of  R,  are  correspondent  with  respect  to  R,  because  they  can 
be  made  identical  by  renaming  h  as  h'  or  vice-versa.  On  the  contrary,  the  two  clauses: 
h(y)  A  <p(x,y)  k(x)  and  h'(y)  a  <p(x,y)  k'(x)  are  not  conespondenl  ones. 

Definition  5:  A  theory  T,  will  be  said  simply  recurrent  iff  all  the  predicates  occurring 
in  the  head  of  the  clauses  of  T,  belong  to  a  predicate  set  S(R)  corresponding  to 
subrelations  of  R  and  there  exists  a  bijective  mapping  from  the  elements  of  S(R)  to  the 
set  of  integer  J  =  ( jl  0  5  j  <  IS(R)I-1  J  such  that: 

(a)  The  predicate  h^”*  e  S(R)  is  defined  by  a  set  of  primitive  definitions. 

(b)  The  predicate  h*‘^e  S(R)  is  defined  by  a  set  Cy  (h^'^  h“”)  of  simple  clauses  on  h****. 

(c)  Any  oljier  predicate  h'*’  e  S(R)  {2  5  i  S  (IS(R)I-1)|  is  defined  by  a  set  of  predicates 

C/  (h‘'\  h  '■'’)  of  simple  clauses,  whose  elements  are  one-to-one  correspondent  to 
the  elements  of  Cy  (h***,  h‘®V 

(d)  No  other  clause  belongs  to  T,.  □ 

As  an  example,  the  following  T,  is  a  simply  recurrent  theory: 

T,  =  I  a(x,y)  h<®Hx):  W'”(y)A<p(x.y)  -» h<'>(x); 

h<')(y)A  (p(x,y)  hri)(x);  hf2>(y)A<p{x,y)  h<^Hx)| 

Given  a  set  S(R)  of  subrelations  of  R,  let 


427 


i 

R.=  U  EXT(h(n)  (2.1) 

J  r=0 

be  the  union  of  the  extensions,  evaluated  on  F,  of  all  the  h^'^^'s  (0<r<j).  A  simply 
recunent  theory  shows  the  interesting  property  proved  in  the  following: 

Theorem  1:  Given  a  simply  recurrent  theory  T,.  if  there  exists  a  number  n  such  that  the 
extension  on  F  of  the  predicate  h^"*’^  is  contained  in  R„,  then  EXT(h‘'')  Q  Rn  (Vr  > 
n+1)  In  this  case  T,  will  be  said  n-  complete. 

Proof:  By  hypothesis.  T,  contains  the  following  clauses: 

a(tv)  -»h‘®»(x)  Ro  =  EXT(h<“') 

h‘®\y)A<p(x.y)  h<‘\x)  R,  =  EXT(h‘®’)  U  EXT(h‘'>) 


h<  (y)A<p(x.y)  -» h<"’(x)  Rn  =  EXT(h“”)  U  ...  U  EXT(h‘ 

h‘"’(y)A<p(x.y)-4h'"*'’(x)  EXT(h‘"*'’ C 

The  thesis  is  immediately  proved  by  noticing  that  for  any  set  of  objects  a  6  Rn 
there  will  exist  a  set  of  objects  b  (generated  by  one  of  the  clauses  belonging  to 
T.)  which  belongs  to  R„  either  by  definition  of  Rn  itself  or  by  hypothesis.  Then, 
being  EXT(h^"*‘’)  C  R„,  also  the  relation  Rn+k  Q  R„  (Vk  >  1)  holds  true,  q.d.e. 

■ 

Theorem  1  guarantees  that  if  the  n-th  application  of  a  recursive  clause  does  not 
generate  any  new  instance,  then  no  further  application  of  the  same  clause  wiU. 

Definition  6:  Given  a  simply  recurrent  theory  T,.  let  S(R)  be  the  set  of  subrelations  of  a 
same  relation  R,  occurring  in  T,.  The  renaming  of  all  the  predicates  h^’  e  S(R)  with  a 
unique  name  h  will  be  said  a  recursive  generalisation  of  T,.  The  obtained  new  theory. 
T.  will  be  called  a  simply  recursive  theory.  □ 

For  simply  recurrent  theories  the  following  theorem  holds. 

Theorem  2:  Given  an  n-complete.  simply  recurrent  theory  T,,  let  us  rename  all  the 
predicates  belonging  to  S(R)  as  h.  obtaining  a  new  theory  T.  Then,  the  extension  of  h 
on  F ,  in  the  new  theory  T.  will  be  equal  to  R„. 

Proof:  By  hypothesis.  T,  contains  the  following  clauses: 

o(x.y)  -»h‘®*(x)  Ro  =  EXT(h‘®^ 

h%)A(p(x.y)  -♦  h'*\x)  R,  =  EXTfh^”^  U  EXT(h“') 

h‘"''^(y)A<p(x.y)-»h‘"’(x)  R„  =  EXTfh*”*)  U  ...  U  EXT(h‘"’) 

h‘  "‘(y)A<p(x.y)  h<"*'*(x)  R„*,  =  R„ 

By  renaming  all  the  h^^'s  (j  >  0)  as  h.  we  obtain  a  new  theory  T,  containing  the 
following  clauses:  o(x.y)  ->  h(x).  h(y)  a  <p(x.y)  -»  h(x). 

The  thesis  follows  from  the  observation  that  the  original  set  of  clauses  in  T, 
corresponds  to  the  iterative  application  of  the  two  new  recursive  clauses  in  T. 
Then.  EXT(h)  =  R,.  q.d.e.  ■ 

Theorem  2  guarantees  that  recursive  generalisation  of  an  n-complete  theory  does  not 
change  its  extension. 


Theorem  3:  Given  a  theory  T,  containing  an  n-complctc.  simply  recurrent  subthcory  T^, 
the  recursive  generalisation  of  T,.  obtained  by  renaming  all  the  predicates  in  S(R)  as  h, 
will  not  change  the  extension  on  F  of  any  other  predicate  k  occurring  in  T,  provided 
that,  for  each  clause  belonging  to  T-T^  and  depending  upon  some  predicate  h^'e  S(R), 
there  exists  a  correspondent  clause  for  each  other  predicate  h*'’  e  S(R)  (0  <  r  <  n). 

Proof:  By  hypothesis,  T,  contains  the  following  clauses: 

a(x.y)  -♦h'^Vx)  Ro  =  EXT(h"”) 

h'‘”(y)A<p(x,y)-^h<'’(x)  R,  =  EXIfh*”*)  U  EXT(h“') 

h*  (y)A(p(x,y)  h<"'(x)  R„  =  EXTfh'"*)  U  ...  U  EXT(h‘"’) 

h<">(y)A<P(x.y)-4h<"*''(x)  =  K 

In  T,  the  sub-theory  T*  is  substituted  by:  a(x.y)  h(x),  h(y)  a  (p(x,y)  — >  h(x). 
Let  us  suppose  that  the  clause  c  s  h^'(y)  a  \|/(x,y)  -»  u(x)  belongs  to  T-T,; 
moreover,  no  other  h*'*  (0  ^  i  ^  n)  shall  occur  in  the  formula  v]/. 

By  hypothesis,  a  clause  correspondent  of  c  exists  for  each  other  predicate  h*'*  e 
S(R),  i.e,:  h‘‘”(y)AV(x,y)  u(x) . h‘"^  (y)A\)/(x,y)  ->  u{x). 

Renatning  all  the  predicates  in  S(R)  implies  also  renaming  the  occurrences  of 
any  h^’  e  S(R)  in  the  definitions  of  u.  Then,  the  set  of  clauses  defining  u  will  be 
substituted  by  the  single  clause:  h(y)  a  \g(x,y)  ->  u(x). 

By  iteratively  applying  the  above  clause  and  remembering  that,  from  Theorem 
2,  EXT(h)  =  R„,  we  can  also  conclude  that  the  extension  of  u(x)  has  not 
changed. 

n 

EXT(u)=  ext  (  h<">AV(x.y)) 

r=0 


Theorem  3  establishes  conditions  under  which  recursive  generalisation  docs  not  require 
any  extensional  revision  of  the  theory. 

Another  problem  emerging  in  learning  recursive  theories  is  the  occurrence  of  infinite 
loops,  when  the  same  constants  can  be  bound  to  variables  occurring  both  in  the  left- 
hand  and  in  the  right-hand  side  of  a  recursive  clause.  The  following  theorem  states 
conditions  under  which  loops  cannot  occur. 

Theorem  4:  Given  an  n-complete,  simply  recurrent  theory  T,,  the  simply  recursive 
theory  T,  obtained  by  renaming  as  h  all  the  predicates  belonging  to  S(R),  will  have  an 
acyclic  ground  graph  on  F  iff  the  extension  on  F  of  the  predicate  h*"***e  S(R)  is  empty. 

Proof:  For  the  if  part  of  the  thesis,  we  observe  that,  by  hypothesis: 

EXT(h'"*'')  =  0.  We  have  to  prove  that  T  has  an  acyclic  ground  graph. 

Let  us  suppose  that  a(x,y)  -4  h(x)  and  h(y)  a  (p(x,y)  h(x)  be  the  recursive 
definition  of  h  in  T.  First  we  observe  that,  if  FXT(h*"*'')  =  0,  then  it  is  also 
EXT(h*'’)  =  0  (Vr  >  n+2).  In  fact,  the  extension  EXT(h*'’)  corresponds  to  the 
clause  h^'*  (r  2  n-(-2),  whose  extension  is  empty,  being  implied  by  an  already 
empty  preniise.  Then,  the  recursion  actually  stops  at  the  (n+l)-th  iteration  and 
there  cannot  be  any  infinite  loop. 


429 


Let  us  now  prove  the  only  if  part  of  the  thesis.  By  hypothesis,  has  an 
acyclic  ground  graph  on  F.  This  means  that  no  two  sets  of  variables  a  and  b 
exist,  such  that:  h(b)  a  9(a,b)  h(a)  and  h(a)  a  (p(b,a)  -» h(b). 

In  other  words,  each  iteration  of  the  recursive  clause  h(y)  a  <p{x,y)  h(x)  will 
cover,  in  F,  disjoint  sets  of  the  relation  corresponding  to  h.  As  F  is  a  finite  set, 
then  EXT(h)  ft  F  is  also  a  finite  set.  Then,  there  will  exist  an  n  5  lEXT(h)  n  FI 
such  that  EXT(h^"**')  must  be  empty. 

q.d.e.  ■ 

The  inductive  algorithm  proposed  in  the  next  section  learns  a  network  of  simply 
recursive  theories.  We  will  now  investigate  under  which  conditions  different  simply 
recursive  theories  can  be  merged,  without  need  of  extensional  tests. 

Definition  7:  Given  two  simply  recursive  theories  T|  and  T2,  defining  subrelalions  hj 
and  h2  respectively,  T]  will  be  said  dependent  on  Tj  if  it  contains  at  least  one  clause 
where  hj.  or  another  subrelation  k  depending  on  h2,  occurs  in  its  body.  □ 

Definition  8:  A  set  of  simply  recursive  theories  will  be  said  a  simply  recursive  network 
SRN  iff,  for  each  pair  of  simply  recursive  theories  (Tj,  T2)  belonging  to  SRN,  either  Tj 
depends  onTa  or  T2  depends  onTj  or  both  depends  on  a  third  one,  T3,  belonging  to 
SRN.  Moreover,  for  any  Tj  e  STN,  the  primitive  definitions  of  hj  should  contain  at 
most  one  occurrence  of  pre^cates  corresponding  to  subrelations  of  another  relation  Rj 
for  any  j.  □ 

Definition  9:  A  simply  recursive  network  SRN  is  said  complete  iff  adding  to  SRN  any 
correspondent  of  a  clause  c,  establishing  a  dependency  between  a  pair  of  simply 
recursive  theories  (T 1 ,  T2),  the  extension  of  SRN  on  F  does  not  change. 

Theorem  5:  Given  a  simply  recursive  network  SRN,  if  SRN  is  complete,  then  renaming 
to  a  unique  name  h  all  the  subrelation  of  the  same  relation  R  will  not  change  the 
extension  of  SRN  on  F. 

Proof:  The  proof  is  analogous  to  that  of  Theorem  2.  ■ 

Finally,  we  recall  the  definition  of  stratified  theories  due  to  Apt  et  al.  [6]. 

Definition  10:  A  theory  T  is  stratified  if  there  is  a  mapping  stratum  from  the  literals  Lf, 
defined  in  T.  to  the  countable  ordinals  such  that  for  every  clause  LiaL|A....aL„  ->  A 
(n^)  in  T  the  following  conditions  hold  for  every  1<  i  <n: 

(a)  if  Lj  is  positive,  say  Lj  is  p,  then  stratum{\)  ^  stratum(Li); 

(b)  if  Li  is  negative,  say  Li  is  -ip,  then  straiuniA)  >  stratumlX,^.  □ 

Informally,  a  theory  is  stratified  if  no  recursive  loop  across  negated  literals  occurs. 

3  Learning  Recursive  Theori^ 

In  this  section  we  will  introduce  a  new  algorithm,  named  RTL  (Recursive  Theory 
Learner),  that  learns  stratified  theories  by  assembling  networks  of  simply  recursive 
theories. 

At  first,  the  algorithm  tries  to  discover  recurrent  structures  in  the  learning  events  by 
iterating  clauses  discovered  by  induction.  In  this  way,  n-complete  simply  recurrent 
theories  are  discovered  and  then  generalised  to  simply  recursive  theories.  Afterwards, 
simply  recursive  networks  (SRN),  that  are  proved  complete,  are  generalised  further  by 
applying  Theorem  5. 


430 


The  process  can  be  clarified  through  an  example.  Suppose  Cj:  hi<°)(y)  a  <(Kx,y)  -> 
hi<‘Hx)  and  C2:  h2<^>(y)  a  v(x,y)  hi<®*(x)  be  two  clauses  discovered  by  induction  in 
previous  steps.  Suppose,  moreover,  a  new  definition  for  predicate  h2^''*  is  found  in  some 
other  step;  then  the  new  clause  h2‘^*(y)  a  \|i(x,y)  ->  hi‘^>(x)  will  be  created,  trying  to 
classify  new  instances  of  h|.  We  will  call  this  operation  an  iteration  step  on  C2  If  the 
attempt  is  successful  because  new  instances  of  hi  are  covered,  a  new  trial  is  made  by 
iterating  clause  Ci  and  creating  clause  hi®(y)  a  <p(x,y)  ->  h,<^>(x).  Then  the  process  will 
go  on,  iterating  similar  clauses  until  an  iteration  step  is  reached  where  no  new  instances 
of  h|  are  covered  or  an  inconsistency  is  found  (i.e.  a  clause  matches  facts  it  should  not 
match).  How  to  deal  with  inconsistencies  is  a  crucial  point  and  will  be  discussed  later 
on.  In  the  case  that  no  inconsistency  occurs,  when  the  iteration  on  h|  ends,  a  n- 
complete  simply  recurrent  theory  T,  has  been  discovered,  that  can  be  recursively 
generalised  to  T. 

Now,  subrelation  hi*^,  defined  by  T.  is  evaluated  and  will  be  used  to  start  an  analogous 
iterative  process  for  other  target  relations.  The  process  goes  on  in  this  way  until  no 
iteration  step  will  cover  new  instances.  We  will  prove  later  on  that,  when  this  happens, 
a  complete  SRN  has  been  found  (according  to  I^finition  7).  If  some  target  relation  is 
still  not  completely  covered,  the  induction  will  be  called  again  in  order  to  find  new 
clauses  (primitives  and  recurrents)  to  restart  the  process.  However,  the  goal  is  to  build 
up  SRN  as  large  as  possible  in  order  to  generalise  them  into  a  unique  recursive  theory 
according,  to  Theorem  5.  To  this  aim,  clauses  where  target  relation  names  do  not  occur 
negated  (positive  clauses)  are  sought  until  possible  by  induction.  In  this  way,  pre¬ 
existing  SRNs  continue  to  grow. 

When  induction  is  unable  to  discover  positive  clauses,  the  only  hope  to  complete  the 
learning  process  is  to  discover  clauses  where  some  target  relation  occurs  in  negated 
form  (negative  clauses).  If  this  is  done,  a  stratum  is  created  in  order  to  prevent  the 
generation  of  recursive  loops  across  the  negated  literals.  Procedure  Stratify,  responsible 
for  stratum  construction,  prevents  recursion  across  strata. 

3.1  The  Main  Algorithm 

Algorithm  RTL,  described  in  Fig.  I,  shows  the  behaviour  described  so  far.  Actually, 
many  details  need  to  be  explained.  Clause  iteration  is  controlled  using  specific  data 
structures.  In  particular,  a  set  (DEPENDENCIES)  of  clause-schemes  is  maintained  in 
order  to  select  the  clauses  to  be  iterated.  A  clause-scheme  is  a  clause  like  C|  or  C2  where 
the  real  names  of  the  target  relations  H  are  used  in  the  place  of  the  subrelation  names. 
Moreover,  a  list  ACnVE(h)  is  used  to  schedule  the  iteration  steps  for  each  target 
relation  h  e  H.  When  a  definition  for  a  subrelation  h^’  of  h  is  discovered  (by  iteration 
or  by  induction)  for  each  clause-scheme  s  depending  on  h,  the  procedure  Trigger 
inserts  an  item  <h^’,  s>  in  the  ACTIVE  list  associated  to  the  relation  name  occurring  in 
the  head  of  s.  When  a  scheme  is  extracted  from  the  the  current  ACTIVE  list,  the 
procedure  Iterate  creates  a  suitable  correspondent  by  renaming  the  relation  names  in  s. 

The  inductive  activity  is  scheduled  using  two  lists;  TODO  and  JUSTDONE.  The  first 
one  contains  the  target  relations  which  are  still  to  be  processed  by  induction,  whereas 
the  second  one  contains  the  relations  that  have  been  already  processed  and  r^t  yet 
completely  covered.  Every  time  a  useful  subrelation  is  learned,  the  list  JUSTDCNE  is 
merged  into  TODO  in  order  to  give  another  chance  to  induction  to  exploit  the  new 
information  just  made  available. 


431 


AlgoPthtnRlL: 

Let  DEPENDENCIES  be  the  set  of  mutually  dependent  clause  schemes.  Let, 
moreover,  a  queue  ACnVE(hi)  of  pairs;  <subrelation,  clause-scheme>  be  defined  for 
each  target  relation  hj. 

Let  TODO  and  JUSTDONE  two  sets  of  target  relations  used  to  schedule  the  inductive 
activity  among  the  target  relations. 

Set  DEPENDENCIES  ,  ACnVEfh;)  (I<i;SN).  JUSTDONE  =  0 
Set  TODO  =  H 
while  TOIX)  ^  0  do 

if  3hi  such  that  ACTIVE(hi)  ^  0  then 

Set  ACTIVE  =  ACTIVE(hj)  being  ACTIVE(hj)  chosen  according  to 
some  preference  order  established  among  the  non  empty  ones. 

Set  h  =  hj 

else  Set  h  =  first(TODO)  and  remove  it  from  TODO 
Set  ACTIVE  =  ACTIVE(h) 

Ph  =  NewnameO;  Induce  some  primitive  definition  v  for  Ph 
Ph  =  NewnameO;  Induce  some  recurrent  ip  for  Ph 
Update  DEPENDENCIES  with  ip  and  <p;  Insert  <Ph,  <p>  in  ACTIVE 
while  ACTIVE  91 0  do 

Set  <p  =  First(ACTIVE);  Set  Wi  =  NewnameO;  Iteratefip,  rh) 
if  an  inconsistency  occur  then  refinefcp) 

If  inconsistency  remains  then  Stratify  before  the  last  iteration  of  <p 
if  <p  is  a  recurrent  on  h  and  new  instances  have  been  covered 

then  append  <Ph,  (f»  to  ACTIVE 
Set  Ph  »  NewnameOvlpp/y  recursive  generalization  for  Ph 
Trigger  DEPEND^CIES  using  Ph  and  Update  ACTIVE  lists 
Set  group  mark  on  Ph 
Uptkte  the  global  extension  EXTg(h)  for  h 
if  h  is  not  completely  covered  then  Append  h  to  JUSTDONE 
if  new  instances  have  been  covered  in  the  last  cycle  then 
Append  JUSTDONE  to  TODO 
Set  JUSTDONE  =  0 
else  if  TODO  =  0  then 

if  JUSTDONE  5t0  then 

Try  to  induce  a  negative  definition  i|/  for  some  h  in  JUSTDONE 
Stratify  brfore  yf 
if  successful  then 

Append  JUSTDONE  to  TODO 
Set  JUSTDONE  =  0 

else  stratify 


Figl.  Abstract  scheme  of  Algorithm  RTL  for  inducing  recursive  theories. 

Now  the  question  is:  which  predicates  should  induction  use  in  order  to  discover  new 
clauses?  The  solution  adopted  here  is  based  on  a  heuristics  aimed  at  generating  an  SRN 
that  can  be  proved  complete  without  further  extensional  tests. 

First  of  all,  we  notice  that  predicates  denoting  subrelations  obtained  in  strata  below  the 
current  one  (after  generalisation)  can  be  u«:d  as  legal  primitive  predicates  in  addition  to 
the  initial  set  of  predicates  P.  In  fact,  the  corresponding  definitions  will  not  be  modified 
any  further. 


432 


Second,  all  predicates  denoting  subrelaiions  defined  by  simply  recursive  theories  in  the 
current  status  could  be  legally  used  as  primitive  predicates.  However,  one  of  the 
problems  that  can  arise  is  the  excessive  increasing  of  the  predicates  that  can  be  used  in 
the  induction  phase.  On  the  other  hand,  we  are  interested  in  constructing  an  SRN 
possibly  complete.  This  means  that  all  possible  correspondents  to  clauses  establishing 
dependency  relations  between  simply  recursive  theories  should  always  be  tried. 

A  method,  semantically  equivalent  but  computationally  much  more  effective  to  check 
all  the  correspondents  of  a  clause  c  (generated  by  induction)  consists  in  letting 
induction  use  as  primitive  predicate,  not  the  single  subrelations,  but  a  relation  h2^ 
defined  as  the  union  of  all  subrclations  of  h2  already  found  in  tbc  stratum.  In  this  way, 
each  clause  of  the  type  h28(y)A\ir(x,y)  -»  hj^^x).  found  by  induction,  will  be 
equivalent  to  the  whole  set  of  correspondents  obtained  by  substituting  h2S(y)  with  the 
single  subrelations  h2^'^  already  existing  in  the  stratum.  Relation  h2S  will  grow  as  long 
as  the  process  continues;  then  different  names  have  to  be  used  to  distinguish  its 
different  instances. 

Therefore,  we  are  now  able  to  prove  the  following  theorem  about  clause  iteration 
process  in  Algorithm  RTL. 

Theorem  6:  Given  Algorithm  RTL,  when  Vh  6  H  it  is  ACTIVE(h)  =  0  then  all  STLs 
in  the  current  stratum  are  complete. 

Proof:  Informally  the  theorem  can  be  proved  by  observing  that,  on  one  hand,  each 
clause  generated  by  induction  is  implicitly  equivalent  to  the  set  of  all  its  correspondents 
generated  for  all  the  existing  subrebtions.  On  the  other  hand,  each  time  the  dcHnition 
for  a  new  subrelation  h^'l  is  found,  all  the  possible  correspondents  for  h^O  arc 
automatically  iterated  by  the  learning  algorithm.  Then  the  process  halts  because  no  new 
facts  can  be  generated  by  creating  new  correspondents.  ■ 

3.2  Dealing  with  Inconsistencies 

Possible  inconsistencies,  delected  iterating  clauses,  are  resolved  by  the  procedure 
Refine.  However,  the  main  motivation  of  the  learning  strategy  proposed  here  is  to 
escape  the  complexity  inherent  in  the  truth  maintenance  of  the  learning  process  that  in 
presence  of  recursion  can  easily  become  intractable.  Eliminating  inconsistencies  by 
making  a  clause  more  specific,  as  it  is  usually  done,  involves  a  revision  of  the 
extension  of  all  the  correspondent  clauses  instantiated  so  far.  Such  a  procedure  could 
be  very  costly.  An  alternative  choice  is  to  forbid  recursive  generalisation  on  the  group 
of  clauses  that  proved  to  have  a  correspondent  inconsistent  in  some  iteration  step. 
Eliminating  from  a  stratum  a  correspondent  clause  resulting  inconsistent  has  implicitly 
this  effect,  because  some  SRN  will  remain  incomplete. 

Here  a  compromise  between  the  two  alternative  has  been  used.  In  particular,  let  c  be 
the  clause  proved  inconsistent;  then  procedure  Refine  tries  to  find  a  set  containing 
one  or  more  new  clause-schemes  covering  the  same  instances  as  c  in  the  past  iterations 
but  consistent  in  the  current  one.  If  the  operation  is  successful,  can  be  substituted  to 
c  every  where,  without  requiring  an  extensional  lest.  Thus,  set  I*'*  is  searched  for  by 
running  induction  under  proper  constraints. 

In  the  case  cannot  be  found,  the  construction  of  the  current  simply  recursive  theory 
stops  and  the  creation  of  a  stratum  is  forced  by  calling  procedure  Stratify.  In  this  way, 
generalisations  leading  to  the  inconsistency  are  prevented. 


433 


33  Creating  Strata 

Creating  a  stratum  aims  at  preventing  generalisations,  i.e.  renaming  of  subrelations  that 
could  merge  simply  recursive  theories  external  to  the  stratum  with  the  internal  ones.  In 
other  words  SRNs  cannot  go  across  strata  boundaries.  Strata  are  created  to  prevent 
cycles  through  negated  literals  or  to  prevent  generalisations  leading  to  inconsistent 
definitions. 

Let  us  analyse,  Hrst,  the  problem  of  avoiding  recursion  through  negated  literals.  Let  us 
consider,  for  instance,  the  following  STN  containing  three  simply  recursive  theories: 

T|=  {9(x.y)  A  h,’(y)  hi'(x) ;  p(x,y)  h,‘(x)  } 

T2*  IqKx.y)  A  h,"(y)  -4  h,“(x);  ^(x,y)A -Oi^XyM  h,"(x)  1  (3.1) 

T3=  1  vU.y)  A  ih'Cy)-^  hi  (x):  a(x,y)  a  h,*(y)-^  h^'Cx)  I 

Predicates  ht'  and  h]”  cannot  be  renamed  to  hi”'  because  in  the  resulting  theory: 

T  =  { <p(x,y)  A  h,  -(y)  h,-(x);  p(x,y)  -4  h,“(x)  ;  ^x,y)A^hi‘(y)  h,-(x); 

i|r(x,y)  A  hi’fy)  -» hi'fx):  c(x,y)A  h,"'(y)  ->  hiX*)) 

recursive  deductions  through  the  negated  literal  -hiiy)  will  occur.  However,  creating  a 
stratum  between  and  {T|,  T3)  will  avoid  this  kind  of  problem  by  keeping  hj'  distinct 
from  h|". 

Let  hid)  be  the  subrelation  used  in  negated  form  by  induction;  then,  procedure  Stratify 
performs  the  following  steps; 

a)  It  individuates  the  network  SRN(hid)),  hi(i>  belongs  to  and  applies  Theorem  5  in  order 
to  produce  the  maximum  generalisation  as  possible  without  extensional  tests. 

b)  For  each  different  subrelation  h^tf),  resulting  in  step  a),  the  extension  on  F  of  h^tf)  is 
constructed.  Then  httf)  is  added  to  the  set  of  primitive  predicates  P. 

c)  The  current  stratum  is  updated  by  removing  SRNChjd)). 

However,  things  are  slightly  different  when  Stratify  is  called  if  a  clause  has  shown  an 
inconsistency  that  procedure  Refine  could  not  remove.  Suppose  the  following  three 
simply  recursive  theories  have  been  generated; 

Ti=  {<p(x,y)  A  hi'(y)-»  hi'(x);  p(x,y)  -4  h,'(x)j 

T2=  i(iKx,y)  A  hi"(y)  -4  h, "(x);  4(x,y)A  hjXy)  -» h,”(x) J  (3.2) 

T3=  (v(x,y)  A  hjXy)  -» hiXx) :  o(x,y)A  h,’(y)  hjXx)) 

Suppose,  moreover,  that  the  construction  of  the  simply  recurrent  theory  T4=  (v(x,y)  a 
hi^'Xy)  ->  hi^^Kx):  o(x,y)  A  hi'(y)  -*  hi^'Xx))  has  been  interrupted  because  clause 
\|r(x,y)  Ah2d)(y)  -4  hi^^Xx),  corresponding  to  the  recursive  clause  in  theory  T3,  was 
found  inconsistent.  Renaming  hj’  and  h,"  to  hj"'  would  implicitly  lead  to  the 
introduction  of  an  inconsistency,  again  b^ause  now  clauses  of  theory  T3  will  be 
applied  also  on  all  instances  of  h|"(x).  Let  h  be  the  subrelation  in  the  head  of  the  clau% 
that  proved  to  be  inconsistent;  thus  procedure  Stratify  will  operate  as  in  the  following; 

a')  Determine  the  set  S  of  all  simply  recursive  theories  h  depends  on.  (Notice  that  a 
non-complete,  simply  recurrent  theory  can  be  seen  as  a  network  of  simply  recursive 
theories  where  no  recurrent  clauses  are  presents) 

a)  Individuate  a  set  of  SRNs  covering  all  S  and  for  each  one  apply  Theorem  5  in  order 
to  obtain  the  maximum  possible  generalisation. 


434 


b)  For  each  different  subrelation  resulting  in  step  a),  the  extension  on  F  of  is 
constructed.  Then  h^(r)  is  added  to  the  set  of  primitive  predicates  P. 

c)  The  current  stratum  is  updated  by  removing  SRN(hi(i)). 

3.4  Learning  Acyclic  Theories. 

Algorithm  RTL  learns  stratified  theories  according  to  Apt  deHnition  [51.  However,  it 
may  be  interesting  to  obtain  theories  having  an  acyclic  ground  graph  on  the  learning 
events  F  (dj.  It  is  easy  to  prove  that,  if  Theorem  4  is  applicable  to  each  simply 
recurrent  theory  constructed  by  Algorithm  RTL  the  ground  graph  will  be  acyclic  on  F. 
Then  it  is  possible  to  force  the  generation  of  an  acyclic  theory  by  restricting  recursive 
generalisation  to  simply  recurrent  theories  satisfying  Theorem  4. 

4  Conclusions 

In  this  paper,  a  new  algorithm  (called  RTL)  for  learning  recursive  Horn  theories  has 
been  described.  The  adopted  approach  aims  at  reducing  the  complexity  due  to  non¬ 
monotonicity  of  the  learning  process  in  presence  of  mutual  dependencies  between 
target  relations.  In  particular,  the  algorithm  learns  only  recursive  dependencies  that  do 
not  involve  revision  of  clauses  already  existing.  Theories  learned  by  RTL  can  be 
stratified  according  to  the  definition  given  by  Apt  [5]  and  can  be  proved  acyclic  with 
respect  to  the  learning  set.  However,  this  last  property  can  be  considered  only  in  a 
restricted  meaning  with  respect  to  the  general  definition  given  by  Apt  [6]. 

References 

1.  H.  Shapiro;  Algorithmic  Program  Debugging.  MIT  Press:  1982. 

2.  R.  Quinlan:  Learning  Logical  Definitions  from  Relations.  Machine  Learning,  5, 
239-266(1990). 

3.  S.  Muggleton  and  W.  Buntine:  Machine  Invention  of  First-order  Predicates  by 
Inverting  Resolution.  I*roc.  Fifth  International  Conference  on  Machine  Learning. 
Ann  Arbor  1988,  pp.  339-352. 

4.  A.W.  Biermann;  The  Inference  of  Regular  Lisp  Programs  from  Examples.  IEEE 
Trans,  on  Systems  Man  and  Cybernetics,  8, 585-600  (1978). 

5.  K.  Apt,  H.  Blair  and  A.  Walker:  Towards  a  Theory  of  Declarative  Knowledge. 
F*roc.  Workshop  on  Foundations  of  Deductive  Databases  and  Logic  Programming. 
Washington:  1986. 

6.  K.  Apt  and  M.  Bezem;  Acyclic  Programs.  Proc.  7th.  Conference  on  Logic 
programming.  Jerusalem:  1990,  pp.  617^33. 


The  Many  Faces  of  Inductive  Logic 
Programming 


Luc  De  Raedt^  and  Nada  Lavrac^ 

Depattment  of  Computing  Science,  Katholieke  Univeisiteit  Leuven,  Celestijnenlaan 
200A,  B-3001  Heverlee,  Belgium 
^  Jozef  Stefan  Institute,  Jamova  39,  61111  Ljubljana,  Slovenia 


Abstract.  Inductive  logic  programming  is  a  research  area  which  has 
its  roots  in  machine  learning  and  computational  logic.  A  short  intro¬ 
duction  to  this  area  is  given.  It  investigates  the  many  faces  of  inductive 
logic  programming  and  outlines  their  applications  in  knowledge  discov¬ 
ery  and  programming.  Furthermore,  whereas  most  research  in  inductive 
logic  programming  has  focussed  on  learning  single  predicates  from  given 
datasets  using  a  strong  notion  of  explanation  (cf.  the  weU-known  systems 
GOLEM  and  FOIL),  we  abo  investigate  a  weaker  notion  of  explanation 
and  the  learning  of  multiple  predicates.  The  weaker  setting  avoids  the 
order  dependency  problem  of  the  strong  notion  when  learning  multiple 
predicates,  extends  the  representation  of  the  induced  hypotheses  to  full 
clausal  logic,  and  can  be  applied  to  different  types  of  appUcation. 


1  Introduction 

Inductive  logic  programming  (ILP)  [22,  4]  can  be  considered  as  the  intersection 
of  inductive  learning  and  computational  logic.  From  inductive  machine  learning, 
ILP  inherits  its  goal:  to  develop  tools  and  techniques  to  induce  hypotheses  from 
observations  (examples)  or  to  synthesize  new  knowledge  from  experience.  By 
using  computational  logic  as  the  representational  mechanism  for  hypotheses  and 
observations,  ILP  can  overcome  the  two  main  limitations  of  classical  inductive 
learning  techniques  (such  as  the  TDIDT-family  [30]): 

1.  the  use  of  a  limited  knowledge  representation  formalism  (essentially  propo¬ 
sitional  logic),  and 

2.  the  inability  to  use  substantial  domain  knowledge  in  the  learning  process. 

The  first  limitation  is  important  because  many  problems  of  various  domains  of 
expertise  can  only  be  expressed  in  a  first  order  logic  and  not  in  a  propositional 
one,  which  implies  that  there  are  inductive  learning  tasks  for  which  no  proposi¬ 
tional  learner  (and  none  of  the  classical  empirical  learners)  can  be  effective  (see 
e.g.  [23]).  The  difficulty  to  employ  domain  knowledge  is  also  crucial  because  one 
of  the  well-established  findings  of  artificial  intelligence  (and  machine  learning) 
is  that  the  use  of  domain  knowledge  is  essential  to  achieve  intelligent  behaviour. 
First  results  in  applying  ILP  (cf.  e.g.  [26,  15,  9])  show  that  the  use  of  logic  as 
a  representation  mechanism  for  inductive  systems  is  not  only  justified  from  a 
theoretical  point  of  view,  but  also  from  a  practical  one. 


436 


Prom  computational  logic,  ILP  inherits  not  only  its  representational  formal¬ 
ism,  but  also  its  theoretical  orientation  as  well  as  some  well-established  tech¬ 
niques.  Indeed,  in  contrast  to  most  other  practical  approaches  to  inductive  learn¬ 
ing,  ILP  is  also  interested  in  properties  of  inference  rules,  in  convergence  (e.g. 
soundness  and  completeness)  of  algorithms  and  in  the  computational  complexity 
of  procedures.  Furthermore,  because  of  the  common  representation  framework, 
ILP  is  relevant  to  computational  logic,  deductive  databases,  knowledge  base  up¬ 
dating,  algorithmic  debugging,  abduction,  constraint  logic  programming,  pro¬ 
gram  synthesis  and  program  analysis,  2tnd  vice  versa. 

In  this  paper,  we  investigate  the  different  faces  of  ILP.  We  first  discuss  two 
different  semantics  for  ILP:  a  strong  semantics  for  ILP  as  chosen  by  Muggleton 
[21]  and  incorporated  in  many  well-known  systems  [22,  25,  31,  4,  5,  18],  and 
a  weak  semantics  introduced  by  Helft  [13]  and  later  followed  by  [12,  7].  It  is 
shown  that  the  strong  semantics  leads  to  some  problems  when  learning  multiple 
predicates,  and  that  these  problem  can  be  avoided  using  the  weak  semantics.  The 
latter  also  allows  for  the  induction  of  full  first  order  clauses.  The  two  semantics 
and  the  different  dimensions  are  relevant  to  all  ILP  techniques.  However,  the 
paper  is  focussed  on  the  so-called  refinement  (general  to  specific)  techniques  used 
in  empirical  ILP  systems.  Other  faces  of  ILP,  referring  to  different  dimensions 
as  perceived  by  users,  are  also  discussed.  Finally,  we  sketch  some  applications 
of  ILP  to  knowledge  discovery  in  databases  [28]  and  programming  and  discuss 
their  relation  to  the  semantics. 

The  paper  is  organized  as  follows:  in  section  2,  we  specify  the  problem  of 
ILP,  and  study  its  different  faces;  in  section  3,  we  survey  some  ILP  techniques 
for  strong  ILP;  in  section  4,  we  investigate  refinement  approaches  to  ILP;  in 
section  5,  we  study  the  problem  of  multiple  predicate  learning  using  the  strong 
and  the  weak  settings  for  ILP;  finally,  in  section  6,  we  conclude  and  touch  upon 
related  work. 


2  Problem  specification 


Roughly  speaking,  ILP  starts  from  an  initial  background  theory  T  and  some 
evidence  E  (examples).  The  aim  is  then  to  induce  a  hypothesis  H  that  together 
with  T  explains  some  properties  of  E.  In  most  cases  the  hypothesis  H  has  to 
satisfy  certain  restrictions,  which  we  shall  refer  to  as  the  bias  B.  Bias  includes 
prior  expectations  and  assumptions,  and  can  therefore  be  considered  as  the  logi¬ 
cally  unjustified  part  of  the  background  knowledge.  Bias  is  needed  to  reduce  the 
number  of  candidate  hypotheses.  On  the  other  hand,  an  inappriopriate  bias  can 
prevent  the  learner  from  finding  the  intended  hypotheses. 

In  this  section,  we  shall  recall  some  logic  programming  concepts  and  use  them 
to  formally  define  the  notions  of  hypothesis,  theory,  and  evidence.  Furthermore, 
we  shall  discuss  two  alternative  semantics  for  ILP:  the  classical  one,  defined  by 
[21],  and  the  alternative  one  of  [13,  12,  7]. 


437 


2.1  Some  Logic  Programming  Concepts 

Definition  1.  A  clause  is  a  formula  of  the  form  Ai, Am  *—  Bi, Bn  where 
the  Ai  and  Bi  are  positive  literals  (atomic  formulae). 

The  above  clause  can  be  read  as  Ai  or  ...  or  Am  if  Bi  and  ...  and  Bn-  All 
variables  in  clauses  are  universally  quantified,  although  this  is  not  explicitly 
written.  Extending  the  usual  convention  for  definite  clauses  (where  m  =  1),  we 
call  Ai,...,Am  the  head  of  the  clause  and  Bi,...,Bn  the  body  of  the  clause.  A 
fad  is  a  definite  clause  with  an  empty  body,  (m  =  1,  n  =  0).  Throughout  the 
paper,  we  shall  assume  that  all  clauses  are  range  restricted,  which  means  that 
all  variables  occurring  in  the  head  of  a  clause  also  occur  in  its  body. 

An  example  is  a  ground  fact  together  with  its  truthvalue  in  the  intended 
interpretation;  positive  examples  are  true;  negative  ones  are  false.  The  sets  of 
positive  and  negative  examples  will  be  denoted  as  E'*'  and  E~ ;  E  =  E'^  \J  E~ . 
In  ILP,  the  background  theory  T  and  hypotheses  H  arc  represented  by  sets  of 
clauses.  For  simplicity,  we  shall  mainly  focus  on  definite  clauses  for  representing 
hypotheses  and  theories.  Nevertheless,  parts  of  our  discussion  extend  to  the 
more  general  normal  program  clauses  [19],  which  are  sometimes  used  in  ILP 
systems  [18,  31].  Language  bias  imposes  certain  syntactic  restrictions  on  the 
form  of  clauses  allowed  in  hypotheses;  for  example,  one  might  consider  only 
constrained  clauses  [18];  these  are  clauses  for  which  all  variables  occurring  in 
the  body  also  occur  in  the  head.  Other  restrictions  frequently  employed  include 
abstract  languages  [4],  determinations  [33],  schemata  [14]  and  ij-determination 
[25]. 

Inductive  logic  programming  can  now  be  defined  as  follows; 

Given; 

—  a  set  of  examples  E 

~  a  background  theory  T 

—  a  language  bias  B  that  defines  the  hypotheses  space 

—  a  notion  of  explanation  (a  semantics) 

Find :  a  hypothesis  H  that  satisfies  the  language  bias  and  explains  the  examples 
E  with  respect  to  the  theory  T. 

Let  us  now  define  the  strong  and  weak  settings  of  ILP,  which  determine  dif¬ 
ferent  notions  of  explanation  and  semantics.  The  terms  strong  and  weak  are  due 
to  Flach  [12].  Current  research  on  ILP  mainly  employs  the  strong  setting  [21]: 

Definition  2.  The  strong  setting  of  ILP,  restricts  H  and  T  to  sets  of  definite 
clauses,  and  employs  the  strong  semantics:  TU  [=  E'^  and  Tl)  H  E~^. 

As  an  illustration,  let  T  =  {bird(tweety),  bird(oliver)},  E~  —  0,  and  E'^  = 
{flies(tweety)}.  A  valid  hypothesis  in  the  strong  setting  is  ci  =  flies(X)  <—  bird(X). 

For  convenience,  TU  H  E~  is  used  as  shorthand  for  Ve  €  E~  :  T  U  H  e. 


3 


438 


Clause  Cl  may  contribute  to  a  solution  because  it  is  consistent,  i.e.  for  any  sub¬ 
stitution  6  for  which  head(ci)6  is  false,  body{ci)9  is  also  false.  Notice  that  ci 
realizes  an  inductive  leap  because  T  together  with  ci  entails  flies(oliver). 

Definitions.  The  weak  setting  of  ILP,  restricts  T  to  n  set  of  definite  clauses, 
H  to  n  set  of  (general)  clauses,  E  to  positive  examples,  and  employs  the  weak 
semantics:  Comp(T  L>  E)  ^  H. 

Comp(K)  denotes  Clark’s  completion  of  the  knowledge  base  K  [3].  The  weak 
semantics  of  ILP  is  related  to  integrity  checking  in  deductive  databases  (cf.  [32]). 

To  illustrate  the  weak  setting,  reconsider  the  above  illustration.  In  the  weak 
setting,  clause  Ci  is  not  a  solution  because  there  is  a  substitution  0  =  {X  <— 
Oliver}  for  which  body(ci)0  is  true  and  head{ci)6  is  false.  However,  the  clause  cj  = 
bird(X)  «—  flies(X)  is  a  solution  because  for  all  X  for  which  flies(X)  is  true,  bird(X) 
is  also  true.  This  shows  that  the  weak  setting  does  not  hypothesize  properties 
not  holding  on  the  example  set.  Therefore  the  weak  semantics  realizes  induction 
by  deduction.  Indeed,  the  induction  principle  of  the  weak  setting  states  that  the 
hypothesis  H,  which  is  deduced  from  the  set  of  observed  examples  E  and  the 
theory  T,  holds  for  all  possible  sets  of  examples  E'.  This  realizes  generalization 
beyond  the  observed  examples.  As  a  consequence,  properties  derived  in  the  weak 
setting  are  more  certain  than  those  derived  in  the  strong  one. 

The  differences  between  the  strong  and  the  weak  setting  are  related  to  the 
closed  world  assumption.  In  most  applications  of  strong  ILP  [15,  26],  only  the  set 
of  positive  examples  is  specified  and  the  set  of  negative  examples  is  derived  from 
this  by  applying  the  closed  world  assumption.  In  our  illustration,  this  results 
in  E~  =  {flies(oliver)}.  Given  this  modified  E~ ,  clause  Ci  cannot  contribute 
to  a  solution  in  the  strong  setting  because  for  o  =  {X  Oliver},  head{c\)a 
is  false  while  body{ci)(T  is  true.  If  on  the  other  hand,  we  ignore  the  difference 
between  theory  and  examples  by  considering  the  problem  where  T  =  0,  E'*'  = 
{flies(tweety),  bird(tweety),  bird(oliver)},  and  E~  =  {flies(oliver)}  (ewa),  clause 
C2  is  also  a  solution  in  the  strong  setting.  Intuitively,  this  shows  that  solutions 
to  problems  in  the  strong  setting,  where  the  closed  world  assumption  is  applied, 
are  also  valid  in  the  weak  setting.  Remember  from  the  database  literature  that 
applying  the  closed  world  assumption  or  Clark’s  completion  of  the  database 
is  only  justified  when  the  universe  defined  by  the  examples  together  with  the 
theory  is  completely  described  (cf.  also  example  1).  For  example,  in  a  medical 
application,  all  patients  in  the  database  should  be  completely  specified,  which 
means  that  all  their  symptoms  and  diseases  should  be  fully  described.  Notice 
that  this  is  different  from  requiring  that  the  complete  universe  is  described  (i.e. 
all  possible  patients). 

Solutions  in  the  strong  setting  with  the  closed  world  assumption  are  also 
solutions  in  the  weak  setting.  The  opposite  does  not  always  hold  and  this  reveals 
the  other  main  difference  between  the  two  settings.  In  the  strong  setting,  the 
induced  hypothesis  can  always  be  used  to  replace  the  examples  because  theory 
and  hypothesis  entail  the  observed  examples  (and  possibly  other  examples  as 
well).  In  the  weak  setting,  the  hypothesis  consists  of  a  set  of  properties  holding  for 


439 


the  example  set.  There  are  no  requirements  nor  guarantees  concerning  prediction. 
For  instance  in  the  weak  setting,  clause  C2  was  a  solution  for  T  —  {bird(tweety), 
bird(oliver)}  and  E'^  =  {flies(tweety)}.  Nevertheless,  it  cannot  be  used  to  predict 
the  example  in  E'^ . 

The  strong  and  weak  faces  of  ILP  are  further  illustrated  by  example  1  in  a 
programming  context  and  example  2  in  knowledge  discovery. 

Example  1.  (Programming) 

Let  E  =  {sort([l].[l]);  sort([2, 1.3], [1,2.31);  sort([2.1].[l]);  “■  sort([3.1.2],[2,l,3])} 

and  let  T  contain  correct  definitions  for  the  predicates  permutation/2,  which 
is  true  if  the  second  argument  is  a  permuted  list  of  the  first  argument,  and 
sorted/1,  which  is  true  if  its  list-argument  is  sorted  in  ascending  order.  Given 
the  strong  setting,  a  possible  solution  to  the  inductive  logic  programming  task 
could  consist  of  the  following  clause: 

sort(X,Y)  permutation(X,Y),sorted(Y)  (ca) 

In  the  weak  setting,  using  E'*'  only,  a  solution  could  consist  of  the  following 
clauses: 

sorted(Y)  «—  sort(X,Y) 
permutation(X,Y)  +—  sort(X,Y) 
sorted(X)  «—  sort(X,X) 

Whereas  the  strong  setting  results  in  a  program  for  sorting  lists,  the  weak  setting 
results  in  a  partial  specification  for  the  involved  predicates.  Notice  that  clause 
C3  does  not  hold  in  the  weak  setting  because  the  definitions  of  permutation  and 
sorted  also  hold  for  terms  not  occurring  in  E  (which  means  that  Clark’s  comple¬ 
tion  is  not  justified,  cf.  above).  On  the  other  hand,  if  we  generalize  the  notion  of 
a  positive  example  to  a  definite  clause  and  replace  the  evidence  E  hy  a  correct 
definition  of  the  sort  predicate  (using  for  instance  the  definition  of  quick-sort), 
clause  C3  holds.  This  illustrates  that  the  weak  setting  is  applicable  to  reverse 
engineering. 

Example  2.  (Knowledge  Discovery) 

Suppose  we  have  a  database  containing  weather  observations  (cf.  [36]).  Assum¬ 
ing  a  simplified  format,  each  observation  could  be  described  by  a  fact  ob(0,T,H), 
where  O  is  the  label  of  the  observation  representing  time,  T  the  temperature  of 
the  observation,  and  H  the  humidity.  The  general  description  of  the  weather  ob¬ 
servations  O  could  be  encoded  in  predicates  such  as  rain(O),  and  snow(O).  The 
background  theory  could  then  contain  definitions  for  the  predicates  next(01.02) 
succeeding  when  01  and  02  are  two  subsequent  observations;  in-temp(0l,02), 
in-hum(01,02)  succeeding  when  temperature/humidity  in  02  is  increased  rela¬ 
tive  to  01. 

Let  us  ckssume  the  aim  is  to  learn  properties  of  the  predicates  rain  and  snow. 
In  the  strong  ILP  setting,  one  would  start  from  the  set  of  all  positive  examples 


440 


for  these  predicates,  apply  the  closed  world  assumption  and  use  the  result  as 
evidence.  H  could  then  be: 

rain(O)  <—  ob(0, very-high, very-high) 
snow(O)  ♦—  next(P.O),  ob(P. very-low, high) 
snow(O)  «—  next(P,0),  in-temp(0,P),  rain(P) 

Using  the  weak  setting,  one  can  start  learning  from  the  given  database  (without 
a  need  to  generate  the  negative  examples)  and  derive  a  hypothesis  including  the 
above  clauses  and: 

♦—  rain(0),snow(0) 

rain(O),  snow'(O)  «—  ob(0,T,high) 

ob(0, very-low, very-high), ob(0, very-low, high)  <—  snow(O) 

In  knowledge  discovery,  it  frequently  occurs  that  the  data  are  noisy,  i.e.  that 
they  contain  random  errors.  When  starting  from  noisy  data,  the  criteria  of  both 
the  weak  and  strong  setting  should  be  relaxed  and  a  limited  number  of  mis¬ 
matches  among  hypotheses  and  examples  should  be  tolerated  [17,  10]. 

2.2  Dimensions  of  ILP 

Whereas  the  above  problem  specification  sketches  the  faces  of  inductive  logic 
programming  in  logical  terms,  practical  ILP  systems  can  be  classified  in  four 
main  dimensions,  from  a  user  perspective: 

—  Empirical  versus  incremental.  This  dimension  describes  ti.e  way  the  exam¬ 
ples  E  are  obtained.  In  empirical  ILP,  the  evidence  is  given  at  the  start  and 
not  changed  afterwards,  in  incremental  ILP,  the  user  supplies  the  examples 
one  by  one,  in  a  piecewise  fashion. 

—  Interactive  versus  non-interactive.  In  interactive  ILP,  the  learner  is  allowed 
to  pose  questions  to  the  user  about  the  intended  interpretation.  Usually  these 
questions  query  for  the  intended  interpretation  of  examples  or  clauses. 

—  Predicate  invention  allowed  or  not.  Predicate  invention  denotes  the  process 
whereby  entirely  new  predicates  (neither  present  in  E  nor  T)  are  induced. 
Predicate  invention  results  in  extending  the  vocabulary  of  the  learner  and 
may  therefore  facilitate  the  learning  task. 

—  Single  versus  multiple  predicate  learning.  In  single  predicate  learning,  the 
evidence  contains  examples  for  only  one  predicate  and  the  aim  is  to  induce 
a  definition  for  this  predicate;  in  multiple  predicate  learning,  the  aim  is  to 
learn  a  set  of  possibly  interacting  predicate  definitions. 

In  table  1,  we  sketch  some  well-known  ILP  systems  along  these  four  dimensions. 
The  ILP  systems  sketched  are:  MIS  [34],  CLINT  [4],  MOBAL  [14],  CIGOL  [24], 
FOIL  [31],  GOLEM  [25],  and  LINUS  [18].  From  the  table  it  follows  that  most  of 
the  systems  are  either  incremental  multiple  predicate  learners  or  empirical  single 


441 


predicate  learners^.  This  means  that  essentially  none  of  these  ILP  systems  is 
applicable  to  knowledge  discovery  in  databases.  Knowledge  discovery  typically 
requires  an  empirical  setting  (all  data  are  given  in  the  database)  and  involves 
multiple  predicates  (as  the  different  predicates  in  the  database  should  be  related 
to  each  other).  Two  novel  extensions  of  FOIL,  that  can  be  regarded  as  empirical 
multiple  predicate  learners  or  knowledge  discovery  systems,  will  be  discussed  in 
section  5.  First,  however,  we  given  an  overview  of  some  ILP  techniques  focussing 
on  the  strong  setting,  for  which  there  are  more  results. 


system 

Emp/Inc  Int/Nin 

Pri/Npr  Sin/Mul 

MIS 

Inc 

Int 

Npr 

Mul 

CLINT 

Inc 

Int 

Pri 

Mul 

MOBAL 

Inc 

Nin 

Pri 

Mul 

CIGOL 

Inc 

Int 

Pri 

Mul 

FOIL 

Emp 

Nin 

Npr 

Sin 

GOLEM 

Emp 

Nin 

Npr 

Sin 

LINUS 

Emp 

Nin 

Npr 

Sin 

Table  1:  dimensions  of  ILP. 


3  ILP  Techniques  in  the  strong  ILP  setting 

As  is  the  case  for  most  problems  of  artificial  intelligence  ILP  can  be  regarded  as 
a  search  problem.  Indeed,  the  space  of  possible  solutions  is  determined  by  the 
syntactical  bias.  Furthermore,  there  is  a  decision  criterion  {T  L)  H  \=  E'^  and 
Tuff  E~ )  to  check  whether  a  candidate  is  a  solution  to  a  particular  problem. 
Searching  the  whole  space  is  clearly  inefficient,  therefore  structuring  the  search 
space  is  necessary.  Nearly  all  symbolic  inductive  learning  techniques  structure 
the  search  by  means  of  the  dual  notions  of  generalization  and  specialization 
[20,  6].  For  ILP,  there  are  syntactical  [29]  and  semantical  (logical)  :^efinitions 
[27,  2]  of  these  notions: 

Definition  4.  (Semantic  Generalization)  A  hypothesis  Hi  \s  semantically  more 
general  than  a  hypothesis  with  respect  to  theory  T  if  and  only  ifTuHi  j= 

Definitions.  (Syntactic  Generalization  or  ^-subsumption)  A  clause  ci  (a  set  of 
literals)  is  syntactically  more  general  than  a  clause  cj  if  and  only  if  there  exists 
a  substitution  6  such  that  ci&  C  cj. 

Plotkin  has  proved  that  the  syntactic  notion  induces  a  lattice  (up  to  variable 
renamings)  on  the  set  of  all  clauses.  Notice  that  when  a  clause  Ci  is  syntactically 
more  general  than  a  clause  cj  it  is  also  semantically  more  general.  Clause  false 
is  maximally  general  for  both  notions  of  generalizations. 

Both  is  more  general  than  relations  are  useful  for  induction  because: 

*  FOIL  and  GOLEM  should  not  be  regarded  as  multiple  predicate  learners,  cf.  [8]  and 
section  5.1. 


442 


-  when  generalizing  a  hypothesis  Hi  to  H2,  all  formulae  /  entailed  by  the 
hypothesis  and  theory  T  will  also  be  implied  by  hypothesis  H2  and 
theory  T,  i.e.  (T  U  Hi  ^  (T  U  ^2  F  /). 

—  when  specializing  a  hypothesis  Hi  to  H2,  all  formulae  /  logically  not  entailed 
by  hypothesis  Hi  and  theory  T  will  not  be  implied  by  hypothesis  Hj  and 
theory  T  either,  i.e.  (T  U  Hi  /)  —  (T  U  H2  /). 

The  two  properties  can  be  used  to  prune  'arge  parts  of  the  search  space.  The 
second  property  is  used  in  conjunction  with  positive  examples.  If  a  clause  does 
not  imply  a  positive  example,  all  specializations  of  that  clause  can  be  pruned, 
as  they  cannot  imply  the  example.  The  first  property  is  used  with  negative 
examples. 

Most  inductive  learners  use  one  of  the  search  strategies  below  (cf.  [6]): 

—  General  to  specific  learners  start  from  the  most  general  clauses  and  repeat¬ 
edly  specialize  them  until  they  no  longer  imply  negative  examples;  during  the 
search  they  ensure  that  the  clausi"!  considered  imply  at  least  one  positive 
example.  Refining  clauses  is  realized  by  employing  a  refinement  operator, 
which  is  an  operator  tha'  computes  a  set  of  specializations  of  a  clause. 

-  Specific  to  ■jeneral  learners  start  from  the  most  specific  clause  that  implies 
a  given  example;  they  will  then  generalize  the  clause  until  it  cannot  further 
be  generalized  without  implying  negative  examples.  Very  often,  generaliza¬ 
tion  operators  start  from  a  clause  and  a  positive  example  not  implied  by 
the  clause;  they  then  compute  the  starting  clause  (the  most  specific  clause 
implying  the  example)  for  the  example  and  compute  the  least  general  gen¬ 
eralization  of  the  two  clauses  (cf.  [29]). 

Both  techniques  repeat  their  procedure  on  a  reduced  example  set  if  the  found 
clause  by  itself  does  not  imply  all  positive  examples.  They  use  thus  an  iterative 
process  to  compute  disjunctive  hypotheses  consisting  of  more  than  one  clause. 
Hypotheses  generated  by  the  first  approach  are  usually  more  general  than  those 
generated  by  the  second  approach.  Therefore  the  first  approach  is  less  cautious 
and  makes  larger  inductive  leaps  than  the  second.  General  to  specific  search  is 
very  well  suited  for  empirical  learning  in  the  presence  of  noise  because  it  can 
easily  be  guided  by  heuristics.  Specific  to  general  search  strategies  seem  better 
suited  for  situations  where  fewer  examples  are  available  and  for  interactive  and 
incremental  processing.  In  the  rest  of  this  paper,  because  of  space  restrictions, 
we  will  only  discuss  general  to  specific  learning;  for  specific  to  general  learning, 
we  refer  to  [22,  4].  Furthermore,  because  there  are  more  results  for  strong  ILP 
than  for  weak  ILP,  we  start  with  the  former. 


4  Refinement  approaches  to  strong  ILP 

Central  in  general  to  specific  approaches  to  ILP,  is  the  notion  of  a  refinement 
operator  (first  introduced  by  [34]): 


443 


Definition  6.  A  refinement  operator  p  for  a  language  bias  B  is  a  mapping 
from  B  to  2®  such  that  Vc  6  B  :  f^c)  is  a  set  of  specializations  of  c  under 
0-subsumption® . 

We  call  the  clauses  c'  €  p(c)  the  refinements  of  c.  In  general,  refinement  operators 
depend  on  the  language  bias  they  are  defined  for.  As  an  example,  consider  the 
refinement  operator  for  clausal  logic  of  definition  7; 

Definition?.  Clause  c'  £  p{c)  iff  (I)  c'  =  head(c),l  <—  body{c),  or  (2)  c'  = 
head{c)  *—  body{c),l,  or  (3)  c'  =  c6\  where  0  is  a  substitution  and  I  a  positive 
literal.  Refinements  of  type  (1)  are  called  head  refinements  of  c;  refinements  of 
type  (2)  body  refinements. 

Using  refinement  operators,  it  is  easy  to  define  a  simple  general  to  specific 
search  algorithm  for  finding  hypotheses.  This  is  done  in  algorithm  1,  which  is 
basically  a  simplification  of  the  FOIL  algorithm  of  [31].  FOIL  learns  DATALOG 
clauses  using  a  refinement  operator  for  DATALOG.  In  FOIL’S  repeat  loop  differ¬ 
ent  clauses  are  induced  until  all  positive  examples  for  the  considered  predicate 
are  implied  by  the  hypothesis.  Once  a  clause  is  added  to  the  hypothesis,  all 
positive  examples  implied  by  that  clause  are  deleted  from  the  set  of  positive 
examples.  To  find  one  clause,  FOIL  repeatedly  applies  a  refinement  operator  (a 
variant  of  that  in  definition  7)  until  the  clause  does  not  imply  any  negative  ex¬ 
ample  for  the  predicate.  FOIL  is  heuristically  guided  as  it  does  not  consider  all 
refinements  of  a  too  general  clause,  but  only  the  best  one  (the  criterion  is  based 
on  information  theory,  see  [31]).  This  amounts  to  a  hill-climbing  search  strategy. 
A  variant  of  this  algorithm  employing  beam-search  is  considered  in  [10]. 

FOIL  is  together  with  GOLEM  [25],  which  works  specific  to  general  instead 
of  general  to  specific,  one  of  the  best  known  empirical  ILP  systems.  Both  systems 
are  very  efficient  and  have  been  applied  to  a  number  of  real-life  applications  of 
single-piedicate  learning  [10,  36,  1,  26,  15,  11].  However,  because  they  employ 
the  strong  ILP  setting,  they  suffer  from  some  problems  when  learning  multiple 
predicates  (see  [8]  and  below). 

5  Learning  multiple  predicates 

5.1  Strong  ILP  setting 

Examining  the  problem  specification  of  the  strong  ILP  setting  reveals  that  the 
clauses  in  a  hypothesis  are  not  independent  of  each  other.  Indeed,  consider 
clauses  ci  and  cj.  Assume  that  TU  {ci}  (z  =  1..2)  imply  some  positive  and  no 
negative  examples.  So,  both  Cj  could  contribute  to  a  solution  in  the  strong  set¬ 
ting.  Therefore  one  might  consider  a  hypothesis  H  containing  {ci,  C2}.  Although 
neither  ci  nor  cj  implies  negative  examples,  it  may  be  that  their  conjunction 
Tu{ci,C2}  does.  This  reveals  that  in  the  strong  ILP  setting  different  clauses  of 

*  Sometimes  p{c)  is  defined  as  the  set  of  maximal  specializations  of  c  under  9- 
subsumption,  cf.  [34,  7] 


for  all  predicates  p  occurring  in  E  do 

P  :=  the  set  of  all  positive  examples  for  p 
N  :=  the  set  of  all  negative  examples  for  p 
hypothesis  H  :=  9 
repeat 

clause  c  :=  p(Xi,...Xn.)  ;  where  all  Xi  are  different  variables 

while  T  U  {c}  implies  negative  examples  from  N  do 
build  the  set  5  of  all  refinements  of  c 
c  :=  the  best  element  of  S 
endwhile 
add  c  to  H 

delete  all  examples  from  P  implied  by  Tu{c} 
until  P  is  empty 

endfor 

Algorithm  1:  A  simplified  FOIL. 


a  hypothesis  may  interact.  Because  of  these  interactions,  learning  multiple  pred¬ 
icates  in  the  strong  ILP  setting  is  necessarily  order  dependent  as  the  meaning  of 
one  clause  depends  on  the  other  clauses  in  the  hypothesis.  Therefore  the  order 
of  learning  different  clauses  affects  the  results  of  the  learning  task  and  even  the 
existence  of  solutions.  As  an  extreme  case,  a  multiple  predicate  learning  task 
may  only  be  solvable  when  clauses  (or  predicates)  are  learned  in  a  specific  order 
(because  of  bias  effects  and  interactions  between  clauses).  In  less  extreme  sit¬ 
uations,  certain  orders  may  result  in  more  complex  (and  imprecise)  definitions 
whereas  better  orders  may  yield  good  results. 

Present  ILP  learners  in  the  strong  setting  provide  little  help  in  determining 
the  next  clause  or  predicate  to  be  learned.  Indeed,  the  incremental  systems  (e.g. 
MOBAL  [14])  rely  on  the  user  as  they  treat  the  examples  in  the  given  order,  the 
interactive  ones  (CIGOL  [24],  CLINT  [4],  and  MIS  [34])  also  generate  queries  to 
further  simplify  the  problem,  whereas  the  empirical  ones  (such  as  GOLEM  [25] 
and  FOIL  [31])  leave  the  problem  to  the  user.  Consider  for  instance  the  FOIL 
algorithm  outlined  in  algorithm  1.  This  algorithm  checks  only  that  individual 
clauses  are  consistent,  not  that  the  hypothesis  as  a  whole  is  consistent  with  the 
negative  examples.  These  and  other  problems  with  existing  strong  ILP  learners 
and  multiple  predicate  learning  are  discussed  in  detail  in  [8]. 

An  attempt  to  alleviate  the  above  problems  is  incorporated  in  the  MPL 
system  of  [8].  The  MPL  algorithm  is  an  adaptation  of  FOIL  using  the  following 
principles; 

-  MPL  performs  a  hill-climbing  strategy  on  hypotheses;  in  each  step  of  this  cy¬ 
cle  it  induces  a  clause;  if  adding  the  induced  clause  to  the  current  hypothesis 
results  in  inconsistencies,  MPL  will  remove  some  clauses  from  the  current 
hypothesis;  otherwise  it  will  continue  with  the  updated  current  hypothesis 
to  learn  clauses  for  unimplied  positive  examples,  if  any. 


445 


—  To  learn  one  clause,  MPL  employs  a  beam-search  similar  to  mFOIL  [10].  The 
only  modification  is  that  rather  than  storing  (definite)  clauses  and  computing 
their  best  refinements,  MPL  keeps  track  of  bodies  of  clauses,  and  selects  at 
each  step  the  best  body  refinements.  The  quality  of  a  body  is  defined  here  as 
the  quality  of  the  best  clause  in  its  head  refinements.  Computing  the  body 
first  and  filling  in  its  head  later  sJlows  to  dynamically  determine  the  next 
predicate  to  be  learned. 

—  When  estimating  the  quality  of  a  clause,  MPL  does  not  only  take  into  account 
the  examples  of  the  predicate  in  the  head  of  the  clause,  but  also  the  other 
ones.  This  reduces  the  number  of  overgeneralizations. 

Preliminary  experiments  have  shown  that  MPL  correctly  induces  hypotheses 
for  some  multiple  predicate  learning  problems  beyond  the  scope  of  the  current 
empirical  ILP  approaches,  cf.  [8]. 

5.2  Weak  ILP  setting 

Many  of  the  above  sketched  problems  disappear  when  adopting  the  weak  ILP 
setting.  Indeed,  the  main  problems  in  the  strong  setting  were  concerned  with 
order  dependency  and  dependencies  among  different  clauses.  Using  the  weak 
semantics  these  problems  disappear  because  Comp{TOE)  |=  Hi  and  Comp{Tu 
E)  t=  imples  Comp{TUE)  |=  H1UH3.  Therefore  clauses  can  be  investigated 
completely  independent  of  each  other  (they  could  even  be  searched  in  parallel); 
it  does  not  matter  whether  Hi  is  found  first  or  H^.  Furthermore,  interactions 
between  different  clauses  are  no  longer  important;  if  both  clauses  are  individual 
solutions,  their  conjunction  is  also  a  solution.  This  property  holds  because  the 
hypothesis  is  produced  deductively  in  the  weak  setting.  In  contrast,  the  strong 
ILP  setting  allows  to  make  inductive  leaps,  whereby  the  hypothesis  together 
with  the  theory  may  entail  facts  not  in  the  evidence.  As  different  clauses  in 
the  strong  setting  entail  different  facts,  the  order  of  inducing  and  the  way  of 
combining  different  clauses  in  the  strong  setting  affects  the  set  of  entailed  facts. 

A  key  observation  underlying  the  CLAUDIEN  system  [7]  is  that  clauses  c  not 
entailed  by  the  completed  database  Comp(TuE)  are  overly  general,  i.e.  there  are 
substitutions  9  for  which  body{c)9  is  true  and  head{c)9  is  false  in  the  completed 
database.  In  section  4,  we  saw  how  overly  general  clauses  could  be  specialized 
by  applying  refinement  operators.  The  same  applies  here.  In  particular,  body 
refinements  will  decrease  the  number  of  substitutions  for  which  body(c)  holds 
whereas  head  refinements  will  increase  the  number  of  substitutions  for  which 
head{c)  holds. 

Bctsed  on  this  we  designed  the  CLAUDIEN  algorithm  (see  algorithm  2)  to 
induce  clausal  theories  from  databases.  CLAUDIEN  starts  with  a  set  of  clauses 
Q,  initially  only  containing  the  most  general  clause  false  and  repeatedly  refines 
overly  general  clauses  in  Q  until  they  satisfy  the  database.  CLAUDIEN  prunes 
away  clauses  already  entailed  by  the  found  hypothesis  at  point  (1)  because  they 
(nor  any  of  their  refinements)  can  result  in  new  information®.  Furthermore,  at 

*  This  is  verified  using  Stickel’s  fast  theorem  prover  [35]. 


446 


Q  :=  {false}-,  H  -.=  «; 
while  Q  ^  0  do 

delete  c  fiom  Q 
if  Comp{T  U  ^  c 
then 

then  add  c  to  H 
endif 

else  for  all  relevant  c'  e  p(c)  (2)  do 
add  c'  to  Q 
endfor 

endif 

endwhile 


Algoiithm  2:  A  simplified  CLAUDIEN. 


point  (2)  it  tests  whether  refinements  are  relevant.  Roughly  speaking,  relevance 
means  that  at  least  one  substitution  is  handled  differently  by  the  refinement 
clause  than  by  its  parent.  Under  certain  conditions  on  the  language  bias  and 
refinement  operator  discussed  in  [7],  irrelevant  clauses  can  safely  be  pruned  away. 
Intuitively,  an  irrelevant  refinement  c'  of  c  is  a  refinement  for  which  c  and  c'  are 
logically  equivalent  in  Comp(T  U  E).  Notice  also  that  CLAUDIEN  does  not 
employ  a  heuristic  search  strategy,  but  a  complete  search  of  the  relevant  parts 
of  the  search  space.  Complete  search'^  allows  to  find  all  properties  expressible 
in  the  given  language  bias,  thereby  the  most  general  hypothesis  explaining  the 
database  is  found. 

Example  3.  (A  CLAUDIEN  example) 

We  ran  CLAUDIEN  on  a  database  containing  family  relations  father,  mother, 
parent,  male,  female  and  human.  The  bias  restricted  clauses  to  contain  at  most 
two  variables  and  at  most  3  literals.  Any  combination  of  predicates  in  clauses 
was  allowed.  CLAUDIEN  derived  the  following  theory: 

♦—  female(X),  male(X) 
human(X)  <—  male(X)  (1) 
human(X)  «—  female(X)  (2) 
female(X),male(X)  *—  human(X)  (3) 

«—  parent(X,X) 
parent(X,Y)  <—  mother(X,Y) 
parent(X,Y)  <—  father(X,Y) 

<—  parent(X,Y),parent(Y.X) 

<—  father(X,Y),mother(X,Y) 
father(X,Y),mother(X,Y)  «—  parent(X,Y) 
human(Y)  +—  parent(X,Y) 


In  the  implementation,  we  employ  depth-first  iterative  deepening  [16]. 


447 

human(X)  «—  parent(X,Y) 
female(X)  *—  mother(X.Y) 
male(X)  —  father(X,Y) 

The  induced  theory  contains  most  of  the  relevant  information  about  the  in¬ 
volved  predicates.  It  could  be  used  as  an  integrity  theory  for  a  database.  It 
also  contributes  to  understanding  the  domain  of  family  relations  as  it  contains 
some  relevant  regularities.  Notice  also  that  the  theory  can  be  used  for  pre¬ 
dictions.  Indeed,  if  parent(luc,soetkin)  and  male(luc)  is  asserted  we  can  derive 
->  mother(luc,soetkin),  father(luc.soetkin),  female(soctkin)  V  male(soetkin),  hu- 
man(soetkin),  etc.  On  the  other  hand,  if  only  father(luc,soetkin)  is  asserted  we  can 
derive  human(luc),  human(soetkin),  female(soetkin)  V  male(soetkin),  male(luc), 
parent(luc,soctkin),  -i  female(luc).  It  is  easy  to  see  that  using  the  above  theory, 
prediction  can  start  from  basically  any  fact(s)  for  any  set  of  predicates.  This 
contrasts  significantly  from  the  strong  ILP  setting,  where  the  inferences  only 
go  in  one  direction.  Given  the  usual  approaches,  the  learner  would  induce  only 
clauses  for  specific  predicates.  E.g.  the  clauses  (1)  and  (2)  could  be  inferred,  or 
alternatively  the  two  normal  clauses  corresponding  to  clause  (3),  i.e.  male(X) 
<—  human(X),  ->  female(X)  (4)  and  female(X)  «—  human(X),  -i  male(X)  (5).  The 
strong  ILP  systems  would  not  allow  to  induce  both  (1-2)  and  (4-5),  because  the 
resulting  program  loops.  Therefore  the  theory  induced  in  strong  ILP  can  only 
be  used  in  one  direction.  Given  (4-5)  one  can  deduce  facts  for  male  and  female. 
Given  (1-2),  facts  for  human  one  can  deduce  facts  for  male  or  female. 


6  Conclusions 


We  have  defined  and  investigated  different  faces  of  ILP,  focussing  on  the  seman¬ 
tics  of  ILP.  We  believe  the  weak  semantics  puts  several  issues  in  ILP  in  a  new 
perspective.  First,  the  weak  setting  allows  to  induce  full  clausal  theories.  Second, 
the  weak  setting  makes  the  learning  of  different  clauses  order  independent,  which 
is  especially  useful  when  learning  multiple  predicates.  Third,  the  weak  setting 
derives  properties  of  examples  instead  of  rules  generating  examples.  Although 
such  properties  cannot  always  be  used  for  predicting  the  truthvalues  of  facts, 
we  have  seen  that  the  use  of  full  clausal  theories  can  result  in  predictions  not 
possible  using  the  less  expressive  strong  ILP  setting.  Fourth,  it  Wcis  argued  that 
the  common  practice  of  applying  the  closed  world  assumption  on  the  example 
set  in  strong  ILP,  corresponds  to  the  use  of  Clark’s  completion  and  a  deductive 
setting  for  weak  ILP. 

On  the  other  hand,  when  we  are  interested  in  predictions,  then  the  strong  set¬ 
ting  is  more  appropriate  as  induced  hypotheses  in  the  strong  setting  can  always 
be  used  for  prediction.  Furthermore,  although  learning  multiple  predicates  in 
the  strong  is  less  efficient,  we  have  presented  a  preliminary  approach  to  multiple 
predicate  learning  in  the  strong  setting. 


448 


Acknowledgements  This  work  is  part  of  the  ESPRIT  Basic  Research  project 
no.  6020  on  Inductive  Logic  Programming.  Luc  De  Raedt  is  supported  by  the 
Belgian  National  Fund  for  Scientific  Research;  Nada  Lavrac  is  funded  by  the 
Slovenian  Ministry  of  Science  and  Technology.  The  authors  are  grateful  to  Ivan 
Bratko,  Danny  De  Schreye,  Saso  Dzeroski,  Bern  Martens,  Stephen  Muggleton, 
Gunther  Sablon  for  discussions,  suggestions  and  encouragements  concerning  this 
work.  Special  thanks  to  Maurice  Bruynooghe  and  Peter  Flach  for  suggesting 
many  improvements  to  an  earlier  version  of  this  paper. 


References 

1.  I.  Bratko,  S.  Muggleton,  and  A.  Varsek.  Learning  qualitative  models  of  dynamic 
systems.  In  Proceedings  of  the  8th  International  Workshop  on  Machine  Learning, 
pages  385-388.  Morgan  Kaufmann,  1991. 

2.  Wray  Buntine.  Generalized  subsumption  and  its  application  to  induction  and 
redundancy.  Artificial  Intelligence,  36:375-399,  1988. 

3.  K.  L.  Clark.  Negation  as  failure.  In  H.  Gallaire  and  J.  Minker,  editors,  Logic  and 
Databases,  pages  293-322.  Plenum  Press,  1978. 

4.  L.  De  Raedt.  /nteractioe  Theory  Revision:  an  Inductive  Logic  Programming  Ap~ 
proach.  Academic  Press,  1992. 

5.  L.  De  Raedt  and  M.  Bruynooghe.  Belief  updating  from  integrity  constraints  and 
queries.  Artificial  Intelligence,  53:291-307,  1992. 

6.  L.  De  Raedt  and  M.  Bruynooghe.  A  unifying  framework  for  concept-learning  al¬ 
gorithms.  The  Knowledge  Engineering  Review,  7(3):251-269,  1992. 

7.  L.  De  Raedt  and  M.  Bruynooghe.  A  theory  of  clausal  discovery.  Technical  Report 
KUL-CW-164,  Department  of  Computer  Science,  Katholieke  Universiteit  Leuven, 
1993.  to  appear  in  Proceedings  of  the  3td  International  Workshop  on  Inductive 
Logic  Programming. 

8.  L.  De  Raedt,  N.  Lavrac,  and  S.  Dzeroski.  Multiple  predicate  learning.  Technical 
Report  KUL-CW-165,  Department  of  Computer  Science,  Katholieke  Universiteit 
Leuven,  1993.  to  appear  in  Proceedings  of  the  3rd  International  Workshop  on 
Inductive  Logic  Programming. 

9.  B.  Dolsak  and  S.  Muggleton.  The  application  of  inductive  logic  programming  to 
finite  element  mesh  design.  In  S.  Muggleton,  editor,  Inductive  Logic  Programming, 
pages  453-472.  Academic  Press,  1992. 

10.  S.  Dzeroski  and  I.  Bratko.  Handling  noise  in  inductive  logic  programming.  In 
S.  Muggleton,  editor.  Proceedings  of  the  2nd  International  Workshop  on  Inductive 
Logic  Programming,  1992. 

11.  C.  Feng.  Inducing  temporal  fault  diagnostic  rules  from  a  qualitative  model.  In 
Proceedings  of  the  8th  International  Workshop  on  Machine  Learning,  pages  403- 
406.  Morgan  Kaufmann,  1991. 

12.  P.  Flach.  A  framework  for  inductive  logic  programming.  In  S.  Muggleton,  editor. 
Inductive  logic  programming.  Academic  Press,  1992. 

13.  N.  Helft.  Induction  as  nonmonotonic  inference.  In  Proceedings  of  the  1st  Inter¬ 
national  Conference  on  Principles  of  Knowledge  Representation  and  Reasoning, 
pages  149-156.  Morgan  Kaufmann,  1989. 

14.  J-U.  Kietz  and  S.  Wrobel.  Controlling  the  complexity  of  learning  in  logic  through 
syntactic  and  task-oriented  models.  In  S.  Muggleton,  editor.  Inductive  Logic  Pro¬ 
gramming.  Academic  Press,  1992. 


449 


15.  R.D.  King,  S.  Muggleton,  R.A.  Lewis,  and  M.J.E.  Sternberg.  Drug  design  by 
machine  learning:  the  use  of  inductive  logic  programming  to  model  the  structure- 
activity  relationships  of  trimethoprim  analogues  binding  to  dihydrofolate  reduc¬ 
tase.  Proceeding/  of  the  National  Academy  of  Science/,  1992. 

16.  R.  Korf.  Depth-first  iterative  deepening  :  an  optimal  admissable  search.  Artificial 
Intelligence,  1985. 

17.  N.  Lavrac  and  S.  Dzeroski.  Inductive  learning  of  relations  from  noisy  examples. 
In  Muggleton  S.,  editor,  Inductive  Logic  Programming  Work/hop,  pages  495-514. 
Academic  Press,  1992. 

18.  N.  Lavrac,  S.  Dzeroski,  and  M.  Grobclnik.  Learning  non-recursive  definitions  of 
relations  with  LINUS.  In  Yves  KodratofT,  editor,  Proceeding/  of  the  5th  European 
Working  Se//ion  on  Learning,  volume  482  of  Lecture  Note/  in  Artificial  Intelli¬ 
gence.  Springer- Verlag,  1991. 

19.  J.W.  Lloyd.  Foundation/ of  logic  programming.  Springer- Verlag,  2nd  edition,  1987. 

20.  T.M.  Mitchell.  Generalization  as  search.  Artificial  Intelligence,  18:203-226,  1982. 

21.  S.  Muggleton.  Inductive  logic  programming.  New  Generation  Computing, 
8(4):295-317,  1991. 

22.  S.  Muggleton,  editor.  Inductive  Logic  Programming.  Academic  Press,  1992. 

23.  S.  Muggleton,  M.  Bain,  J.  Hayes-Michie,  and  D.  Michie.  An  experimental  com¬ 
parison  of  human  and  machine  learning  formalisms.  In  Proceeding/  of  the  6th 
International  Work/hop  on  Machine  Learning,  pages  113-118.  Morgan  Kaufmann, 
1989. 

24.  S.  Muggleton  and  W.  Buntine.  Machine  invention  of  first  order  predicates  by  in¬ 
verting  resolution.  In  Proceeding/  of  the  5th  International  Conference  on  Machine 
Learning,  pages  339-351.  Morgan  Kaufmann,  1988. 

25.  S.  Muggleton  and  C.  Feng.  Efficient  induction  of  logic  programs.  In  Proceeding/  of 
the  1/t  conference  on  algorithmic  learning  theory.  Ohmsma,  Tokyo,  Japan,  1990. 

26.  S.  Muggleton,  R.D.  King,  and  M.J.E.  Sternberg.  Protein  secondary  structure  pre¬ 
diction  using  logic.  Protein  Engineering,  7:647-657,  1992. 

27.  T.  Niblett.  A  study  of  generalisation  in  logic  programs.  In  D.  Sleeman,  editor, 
Proceeding/  of  the  Srd  European  Working  Se//ion  On  Learning,  pages  131-138. 
Pitman,  1988. 

28.  G.  Piatetsky-Shapiro  and  W.  Frawley,  editors.  Knowledge  discovery  in  databases. 
The  MIT  press,  1991. 

29.  G.  Plotkin.  A  note  on  inductive  generalization.  In  Machine  Intelligence,  volume  5. 
Edinburgh  University  Press,  1970. 

30.  J.R.  Quinlan.  Induction  of  decision  trees.  Machine  Learning,  1:81-106,  1986. 

31.  J.R.  Quinlan.  Learning  logical  definition  from  relations.  Machine  Learning,  5:239- 
266,  1990. 

32.  R.  Reiter.  On  asking  what  a  database  knows.  In  J.W.  Lloyd,  editor,  Computational 
Logic,  pages  96-113.  Springer- Verlag,  1990. 

33.  S.J.  Russell.  The  use  of  knowledge  in  analogy  and  induction.  Pitman,  1989. 

34.  E.Y.  Shapiro.  Algorithmic  Program  Debugging.  The  MIT  press,  1983. 

35.  M.E.  Stickel.  A  prolog  technology  theorem  prover:  implementation  by  an  extended 
proiog  compiler.  Journal  of  Automated  Reasoning,  4(4):353-380,  1988. 

36.  C.  Vermeulen.  Een  toepassing  van  automatisch  leren  op  weersvoorspellingen.  Mas¬ 
ter’s  thesis.  Department  of  Computer  Science,  Katholieke  Universiteit  Leuven, 
1992.  in  Dutch. 


CONSENSUS:  A  Method  for  the  Development  of 
Distributed  Intelligent  Systems 


Michael  Bateman 

British  Aerospace  Defence  Ltd, 

Walton  Aerodrome,  ftreston  PR4  1  AX,  UK 

SeanMaitin 

Cambridge  Consultants  Ltd, 

Cambridge  Science  Park,  Cambridge  C^  4DW,  UK 

Andrew  Slade 

School  Engineering  and  Computer  Science,  University  of  Durham, 
South  Road,  Eiuiham  DHl  3LE,  UK 


Abstract.  This  paper  reports  on  the  work  of  CONSENSUS,  a  collaborative  project 
funded  by  the  United  Kingdom  Department  of  Trade  and  Industry  Advanced 
Technology  Programme.  The  project  has  produced  a  proven  engineering  method 
for  developing  large  scale  real-time  systems  with  distributed  intelligent 
components.  Tne  paper  presents  an  overview  of  ihe  method  that  has  been 
developed  and  illustrates  how  it  has  been  used  to  construct  a  large  scale  application 
in  the  area  of  air  traffic  control.  The  method  has  also  been  used  to  develop  a 
dTOamic  tactical  planning  application  that  reacts  to  events  as  they  arise,  which  is 
also  briefly  described  in  the  pt^r. 


1  Introduction 

Many  applications,  where  new  technologies  are  being  deployed,  are  characterised  by: 

•  increasing  quantity  of  data  from  sensors  and  other  sources, 

•  itKreasing  need  for  interpretation  of  data, 

<  reducing  response  times, 

•  increasing  workload, 

•  increasing  need  for  planning  in  response  to  events  as  they  arise, 

•  increasing  demand  for  multiplexing,  display  management  and  control. 

The  commercial  justification  for  new  systems  is  usually  based  on  delivering  greater 
throughput  or  performance  without  a  corresptHiding  increase  in  staffing  levels.  This 
requires  the  division  of  responsibility  between  man  and  machine  to  be  repartitioned: 
hence  the  madiine  will  perform  some  tasks  that  may  have  previously  been  performed 
by  the  man,  aixi  thus  require  intelligent  behaviour.  There  is  thus  a  need  for  increasing 
intelligence  in  applications  where  new  technologies  are  deployed. 

In  addition,  since  most  practical  applications  are  inherently  distributed  or  naturally 
concurrent,  there  is  a  need  for  djstributed  intelligent  behaviour.  In  most  cases  a 
centralised  ^^proach  is  inappropriate  thus  emphasising  the  need  for  distributed 
intelligence,  which,  for  example,  is  more  resilient  in  the  event  of  partial  system 
failure. 


451 


There  has  as  yet  not  been  an  accepted  software  engineering  method  for  developing 
eal-time  distributed  knowledge  ba^  systems.  As  Bond  and  Gasser  [3]  note: 

"An  engineering  science  perspective  on  Distributed  AI 
would  investigate  bow  to  build  automated,  coordinated 
problem  solvers  for  specific  applications." 

The  CONSENSUS  project  addresses  this  need,  and  has  produced  a  proven 
engineering  method  for  developing  operational  systems  with  distributed  knowledge 
based  components  that  address  the  requirements  Usted  above.  This  provides  a  sound 
basis  for  developing  operational  systems  maximising  the  benefits  delivered  by 
advanced  software  technology. 

2  The  Scope  of  the  CONSENSUS  Method 

The  CONSENSUS  project  involves  some  fifteen  man  years  of  effort  shared  by  three 
partners  over  a  three  year  period.  The  partners  are  British  Aerospace  Etefence 
Limited,  Cambridge  Consultants  Limited  a^  the  University  of  Durham,  all  of  whom 
have  extensive  experience  in  concurrent  real-time  knowledge  based  engineering 
applications. 

The  scope  of  the  CONSENSUS  method  is  illustrated  in  Figure  1 .  It  concerns  the 
development  of  large  real  time  systems  ccxnprising  knowledge  based  components  that 
need  to  be  engineered  into  a  system  that  meets  its  needs.  By  implication,  the  scope  of 
the  project  covers  the  complete  life  cycle  fi"om  concept  through  to  operation,  and 
irKludes: 

•  an  analysis  of  the  need  and  functioning  of  the  system, 

•  derivation  of  an  appropriate  architectural  design, 

•  detailed  design  and  implementation, 

•  verification,  validation  and  testing, 

•  maintenance  and  support. 


452 


Input* 

Output* 

The  method  helps  the  designer  to  visualise  how  distributed  intelligence  will  function 
in  terms  of  the  interaction  between  knowledge  based  components  and  to  assess  the 
associated  demands  for  communicatioo. 


The  CONSENSUS  ^proach  is  suited  to  applications  that  exhibit  any  of  the  following 
characteristics,  as  illustrated  in  Figure  2; 

•  data  from  a  variety  of  sources  is  used  and  combined  (i.e.  data  fusion), 

•  a  model  of  the  external  world  is  constructed  from  this  information. 


453 


•  real  time  reasoning  is  required  to  react  to  events  as  they  arise, 

•  off-line  reasoning  is  required  for  longer  term  planning, 

•  there  may  be  several  reasoning  modules, 

•  there  is  a  controller  for  supervising  actions, 

•  there  are  effectors  (i.e.  something  is  controlled  or  displayed), 

•  subsystems  can  be  identified  that  co-operate  to  fulfil  the  system  goals, 

•  there  may  be  varying  granularity  between  the  subsystems, 

•  the  application  is  inherently  distributed  or  runs  on  a  distributed  network. 

Many  applications  have  these  characteristics  and  there  are  often  two  high  level 
architectu^  approaches  that  ate  both  addressed  by  the  CONSENSUS  method: 

•  a  distributed  system  with  centralised  functions;  where  data  may  be  gathered  and 
processed  at  source  but  overall  coordination  is  exercised  ft'om  a  control  room, 

•  a  distributed  system  with  distributed  functions:  where,  as  in  telecommunications 
network  management,  control  is  distributed. 

3  The  CONSENSUS  Method 

3.1  Overview 

The  CONSENSUS  method  is  a  system  specification  af^roach  which  is  made  up  of  a 
requirements  model  based  on  a  current  software  engineering  qrproacb  for  the 
development  of  complex  software  systems  and  an  architecture  model  based  on  a 
distributed  blackboard  approach.  The  key  points  are: 

•  the  requirements  model  and  the  architecture  model  are  developed  togetber; 

•  the  requirements  model  defines  what  the  system  is  to  do  and  is  independent  of  the 
implementation  technology; 

•  the  architecture  model  defines  how  the  system  is  to  be  structured  atxl  must  take 
account  of  how  the  system  is  to  be  implemented. 

The  requirements  and  architecture  models  are  developed  in  parallel,  starting  with  a 
high  level  descr^tion  of  the  system,  and  proceeding  by  refinement  and  iteration  until 
a  detailed  and  complete  definition  of  the  system  is  produced.  This  view  of  systems 
development  is  illustrated  in  figure  3. 


454 


PrlodpUa  of  Syotam  SpocMIeallon 

•  OaoompoMon  ond  Coneunonl  Procoo— 

•  Infarmollon  Hiding 
<  IMuUdty 

•  Abotaellon 

Noturoly  FW  Togothor _ 


Fig.  3.  Overview  of  the  Method 


The  piim^  benefit  of  this  approach  is  that  early  partitioaing  and  allocation  of 
functions  in  the  system  helps  identify  critical  functions  which  can  be  prototyped  or 
reappraised. 

3.2  The  CONSENSUS  Requirements  model 

As  illustrated  in  Figure  3,  the  approach  for  developing  the  requirements  model  is 
based  on  teal  time  structured  atialysis  techniques  using  hierarchical  data  flow  and 
process  decomposition,  together  with  control  and  timing  specifications,  to  allow  for 
the  requirements  of  real-time  systems. 

The  behaviour  of  the  system  is  analysed  and  decomposed  into  a  set  of  simpler  process 
models,  each  of  which  is  characterised  by  a  process  definition.  In  practice  these  map 
quite  well  to  rule  sets  defining  explicit  knowledge  in  the  implementation. 

The  process  of  functional  decomposition  is  taken  to  the  point  where  the  system 
requirements  carmot  be  refined  any  furdier,  aixl  a  level  of  detail  is  reached  where 
implementation  (that  is  to  say  architectural)  considerations  come  into  play. 

The  process  of  functional  decomposition  starts  with  a  top-  level  view  of  the  system 
and  by  a  process  of  iterative  refinement  produces: 

•  a  hierarchy  of  data-flow  and  control-flow  diagrams,  starting  with  a  system  context 
diagram  and  followed  by  as  many  levels  of  decomposition  as  are  necesssary, 
specifying  how  the  system  is  structured  into  processes,  data-Qows  and 
control-flows; 

•  a  set  of  process  specifications  defining  how  inputs  are  transformed  into  outputs; 

•  a  set  of  control  specifications  defining  the  interaction  of  control  signals; 

•  a  definition  of  the  objects  and  their  structure; 


455 


•  a  definition  of  the  objects  and  their  stnicture; 

•  a  set  of  timing  specifications  defining  the  limits  in  response  time  allowed  between 
ii^uts  and  output. 

The  above  are  developed  in  conjunction  with  the  architecture  model,  describing  and 
defining  the  system  in  terms  of  the  implementatioo  technology.  The  process  of 
functio^  decomposition  should  produce  collections  of  functions,  which  can  be 
grouped  together  into  separate  processes  which  can  work  concurrently  and  collaborate 
by  communicaticHi  towards  the  overall  goal  of  the  system. 

It  may  also  be  possible  to  identify  groups  of  potentially  concurrent  functions  at 
different  de^es  of  resolution,  but  the  decision  on  this  matter  is  more  dependent  on 
implementation  constraints  atul  sh'^uld  be  taken  fiom  the  system  architecture  model. 

33  The  CONSENSUS  Architecture  Model 

Soon  after  begin^g  the  requirements  model,  the  developer  begins  the  task  of 
deriving  the  architecture  model,  in  which  constraints  imposed  by  implementation 
considerations  may  be  more  appropriately  addressed.  Ihe  development  of  the 
architecture  specification  is  based  on  a  technique  for  developing  distributed 
blackboard  and  multi-agent  architectures.  This  allows  multiple  components  of  the 
system  to  commurucate  and  thereby  co-operate  towards  solving  the  system’s  goals. 

This  approach  encourages  modular  design  comprising  several  independent 
communicating  knowledge  based  agents  (I&As).  ^ch  l^A  comprises  a  local 
blackboard,  accessed  by  local  knowledge  sources,  a  controller  and  communication 
channels  to  other  KBAs.  The  communication  between  KBAs  is  constrained  to 
encourage  modularity. 

The  system  architecture  model  is  derived  in  parallel  to  the  requirements  model.  It  is 
the  purpose  of  the  system  architecture  model  to; 

•  identify  the  set  of  agents  which  make  up  the  system; 

•  define  the  information  flow  between  the  agents  in  the  system;  and 

•  specify  the  chaimels  on  which  the  information  flows. 

This  is  implemented  by  using  an  arrangement  similar  to  that  of  the  requirements 
model.  Again  diagrams  aiKi  supporting  textual  specifications  are  used; 

•  a  set  of  architecture  flow  diagrams  showing  the  configuration  of  the  knowledge 
based  agents  and  the  data  flow  between  them; 

•  a  set  of  architecture  interconnect  diagrams  showing  the  physical  intercoimection 
between  the  knowledge  based  agents  in  terms  of  cbmmels; 

•  a  set  of  architecture  module  specifications  capturing  the  allocation  of  the 
requirements  for  each  knowledge  based  agent; 

•  a  set  of  architecture  interconnect  specifications  detailing  the  properties  and 
characteristics  of  each  channel  between  agents. 

The  grouping  of  functions  which  is  the  result  of  functional  decomposition  together 
with  the  decision  of  where  to  pitch  the  transition  point  firom  functional  decomposition 
to  knowledge  based  agents  will  yield  the  division  of  the  overall  task  into  a  collection 
of  concurrent  cooperating  agents.  There  are  a  number  of  factors  to  consider  in  the 
decision  of  when  (in  terms  of  the  decomposition)  to  do  the  translation  from  functional 
decomposition  to  KBAs; 


456 


Although  the  technological  base  for  the  lequirement  and  the  architecture  models  is 
quite  different,  the  fundamental  principles  are  very  similar,  and  in  consequence  they 
do  fit  together  very  well  to  form  an  integrated  approach  for  developing  systems  with 
concurrent  knowledge  based  components.  They  both  share  the  following  principles  in 
common: 

•  decomposition:  into  co-ope  abag  concurrent  processes  communicating  via  defined 
channels; 

•  information  hiding:  whereby  individual  processes  are  not  concerned  with  the 
internal  details  of  other  processes; 

•  modularity:  whereby  individual  processes  can  be  designed,  developed  and  tested 
in  isolation  prior  to  integration  with  other  processes; 

•  abstraction:  allowing  high,  as  well  as  low,  level  system  specifications. 

Although  the  primary  purpose  of  the  CONSENSUS  method  is  for  the  design  of 
knowledge  based  components,  its  sound  basis  on  software  engineering  principles 
allows  it  to  be  apphed  to  purely  procedural  implementations  and  on  mixed  systems 
comprising  both  procedural  and  knowledge  based  components. 

4  The  CONSENSUS  Demonstrator  Applications 

The  CONSENSUS  method  can  be  illustrated  by  the  development  of  one  of  the  two 
demonstrator  appUcations  of  the  project,  which  is  described  below. 

4.1  Air  Traffic  Control  Demonstrator 

The  appUcation  is  based  on  a  report,  publicly  available  at  the  Civil  Aviation  Authority 
library, [2]  defining  an  intelligent  s  •  tem  which  could  act  as  an  assistant  to  an  air 
traffic  controller.  The  top  level  of  the  system  is  made  up  of  a  simulator,  an  ATC 
workstation  arxl  the  interface  between  the  two.  The  application  is  implemented  in 
Muse  [4],  Muse  is  a  real-time  AJ  toolkit  based  upon  the  blackboard  model  of  problem 
solving,  providing  a  number  of  knowledge  representation  styles  (rules,  demons, 
objects,  and  so  on).  The  final  implementation  uses  27  Muse  processes  and  one  C 
process,  communicating  with  each  other  through  sockets. 

The  system  comprises  the  components  of  the  ATC  workstation  and  a  simulate*  of 
controlled  aircraft,  where  the  overall  organisation  is  shown  in  Figure  4: 


457 


•  the  total  number  of  KBAs  should  not  be  so  great  that  the  communications 
overheads  are  prohibitive.  KBAs  will  usually  not  be  simple  processes,  but 
complex; 

•  KBAs  should  be  manageable  in  size  and  complexity.  If  there  is  too  much 
functionality  in  a  given  KBA  it  should  be  spbt  up  (especially  when  there  is  an 
^parent  fracture  of  functionality); 

•  if  simple  processes  are  closely  associated  and  share  data  or  communicate  with 
each  other,  then  they  should  be  grouped  together  in  a  single  KBA; 

•  if  simple  processes  share  the  same  parent  then  they  should  probably  be  in  the 
same  KBA; 

In  the  above  context  it  should  be  borne  in  mind  that  simple  processes  refer  to  those 
which  have  too  low  a  workload  to  be  separate  KBAs  for  reasons  of  communication 
overheads  These  are  not  necessarily  the  same  as  primitive  processes  in  the 
requirements  model. 

In  order  to  derive  an  architecture  model  for  a  CONSENSUS  system  the  specifications 
in  the  requirements  model  need  to  be  translated  into  specifications  for  the  different 
KBAs  wUch  make  up  the  system,  together  with  a  specification  for  the  architecture 
and  communication  between  them: 

•  Communication  Channels:  The  d^a-flows  aixl  control-flows  of  the  data-  and 
control-flow  diagrains  need  to  be  translated  into  messages  passed  through 
predefined  communication  channels  observing  the  restrictions  imposed  by  the 
architecture. 

•  Local  Database:  All  data  which  is  required  for  processing,  the  results  of 
processing  and  messages  to  and  from  other  KBAs  appear  in  the  local  database. 
The  data-stores  also  treed  to  be  included  either  in  dedicated  KBAs  or  as  a  special 
section  within  a  KBA. 

•  Knowledge  Sources:  The  actual  functionality  of  the  system,  as  described  by  the 
requirements  model,  will  be  translated  into  a  collection  of  knowledge  sources 
wtucb  operate  on  the  contents  of  their  local  database  or  by  requesting 
communications  with  other  KBAs. 

•  Local  Controller:  The  local  controllers  exert  control  over  the  .i:.tion  of  the  local 
knowledge  sources  and  act  as  the  postman  on  behalf  of  their  knowledge  sources 
for  communications  with  other  KBAs.  If  there  are  any  control  requirements  left 
after  the  functional  requirements  have  been  translated  into  knowledge  sources, 
they  have  to  be  implemented  by  the  local  controller. 

3.4  Overall  Principles 

Superficially,  the  process  of  deriving  a  system  specification  from  a  requirements 
mt^el  arxl  an  architecture  model  would  appear  to  be  a  straightforward  one.  However, 
experience  with  the  design  for  the  first  demonstrator  af^lication  (described  below) 
has  shown  that  the  transition  from  functional  decomposition  used  in  the  requirements 
model  to  agent  based  design  used  in  the  architecture  model  needs  guidance.  This 
problem  has  received  considerable  attention  in  the  project,  and  has  become  a  part  of 
the  method. 


458 


Scen«rio  Control _ 

^  Simulator  Display 

Simulator  Metrics 


Supervisor  Display 

SimulsiioD 

i  Scenario  Changes 

/  ATC  Metrics 

Datalink  Commands 
Weather  Conditions 


Datalink  Datt 

/  ATC  \ 

Wsnungs 

/  Simulator  1 

Flight  Plan 

WoriutsrioQ  ] 

Commends  I 

Contre^ler 

V  .  zzz. 

Aircraft  Detail 

k,  2  J 

Aircraft  Positions 


Data  Flows 
- -  Control  Flows 


Fig  4.  Air  Traffic  Control  Application 


The  Simulator.  The  simulator  generates  scenarios  for  the  ATC  workstation  and 
responds  to  messages  from  it  There  are  some  pre-de&aed  scenarios  and  scenario 
changes  can  be  iiKorporated  from  a  supervisory  console.  Performance  metrics  are 
also  catered  for.  The  simulator  (process  1)  breaks  down  into  6ve  (top-level) 
processes.  A  Traffic  Generation  process  generates  the  air  traffic  that  apmars  (luring  a 
simulation  run.  A  Route  Planner  generates  and  updates  an  aircraft’s  Qight  plan.  An 
Aircraft  Movement  Controller  calculates  an  aircraft’s  movement  from  its  details, 
flight  plan  and  the  weather  conditions.  An  Advice  Checker  checks  the  conflict 
avoidance  advice  and  information  requests  coming  into  the  simulator.  A  Weather 
Details  process  manages  the  weather  within  the  air  sector  and  those  sub-sectors  that 
bound  tte  air  sector. 


The  ATC  workstation.  The  ATC  workstation  (process  2)  ccmsists  of  the  following 
processes.  (Compared  to  the  oiigmal  conception,  the  functionality  has  been  shmmed 
down).  The  Predictor  calculates  prediaed  courses  of  aircraft  (up  to  20  minutes 
ahead),  and  warns  of  any  impending  conflicts.  It  also  performs  prediction  services  for 
the  odier  tools.  The  Wotifer  (from  "what  if..?")  assists  the  controller  in  the  r^id 
formulation  of  a  plan  for  the  movement  of  an  aircraft  through  his  sector.  For  example, 
it  offers  default  plans  to  the  controller,  or  fills  in  plan  details.  The  Aircraft  Arrival 
process  sets  up  the  details  of  an  aircraft  before  it  enters  the  control  of  the  ATC 
workstation.  The  Communicator  handles  the  rapid  communication  of  messages  for 
datalinlf  transmission  and  the  display  of  datalink  messages  received.  It  also  validates 
messages  to  aircraft  against  the  aircraft  details.  The  Monitor  monitors  aircraft  for 
conformatKe  with  flight  plan,  or  for  any  unusual/abnormal  behaviour.  It  also  monitors 
controller  instructions  against  aircraft  capability.  The  MMI  component  displays  the 
workings  of  other  components,  providing  a  visual  irxlication  of  conflicts,  deviations 
from  flight  plan,  recommended  routes  arxl  so  on. 


459 


4Jt  Design  niustrations. 

To  show  the  method  at  woik,  the  Monitor  component  (process  2.S)  of  the  ATC 
workstation  will  be  used  as  an  example.  The  examples  come  from  the  design 
documenudon  and  source  code  of  the  demonstrator  application;  (...  indicates  omitted 
text). 

The  Monitor.  The  Monitor  is  divided  into  several  sub-processes,  as  shown  in  the 
reouirements  model  diagrams  of  figures  S.and  6.  (Cmles  represent  processes; 
unbroken  lines  are  data  flows;  dotted  lines  ate  control  flows,  which  basic^y  enable 
or  disable  processes;  pairs  of  horizontal  hoes  indicate  data  stores).  The  Command 
Monitor  monitors  airoaft  for  conformance  with  clearances.  It  ne^  details  of  the 
aircraft’s  position  atxl  the  last  command  issued  to  it  The  Command  Checker 
compares  each  new  command  (as  it  arrives)  with  the  flight  plan  for  the  appropriate 
airciWt  The  Flight  Plan  Override  regenerates  a  last  command  in  cases  whm  an 
aircraft  is  off  the  flight  plan  and  for  some  reason  there  is  no  last  command.  The  Flight 
Monitor  checks  airoaft  positions  against  the  current  flight  plan.  If  the  airoaft  is  not 
on  the  flight  plan  (in  four  dimensions)  within  the  same  (tegree  of  accuracy  as  used  by 
the  Command  Checker,  above,  then  various  validation  ani  warning  steps  are  taken. 
The  Behaviour  Monitor  checks  each  aircraft’s  position  and  history  (previous 
positions)  against  the  aircraft’s  irerformance  statistics.  If,  after  taking  weather 
conditions  into  account,  the  aircraft  qipears  to  be  behaving  abnormally  then  various 
error  reports  ate  sent  out  The  Boundary  Monitor  keeps  track  of  whether,  and  when, 
an  aircr^  leaves  or  enters  a  sector.  The  New  Aircraft  Checker  determines  whether  or 
not  an  aircraft,  which  is  under  the  cooiroUer’s  control  and  is  approaching  the  sector, 
will  enter  the  seaor  at  the  eiqiected  position  and  time. 


Fig  S.  Data  Flow  Diagram  for  the  MonitOT 


460 


L«st 

Co«wan0 


S»Ct©r 


.ft 


Fig  6.  Control  Flow  Diagram  for  the  Monitor 


Requirements:  Sample  Process  Specification.  The  requirements  model  specifies 
what  the  system  is  to  do.  Its  main  components  ate  data  and  control  flow  diagrams,  aixl 
textual  process  specifications  (PSP£<^).  An  example  of  a  PSPEC,  taken  from  the 
design  docoment  for  the  Monitor  process,  is  shown  below.  It  states  the  requirements 
for  the  Command  Monitor  (process  2.S.1).  A  language  staixlard  for  the  writing  of 
PSPECs  was  devised  as  a  part  of  the  work  of  the  project.  The  exanmle  borrows  fitMn 
various  sources;  constant  definitions  (const);  structure  accesses  (by  .)  atxi  so  on. 
Braces  are  used  for  the  introduction  of  an  abbreviation  as  well  as  for  comments.  Some 
auxiliary  function  defirtitions  (begitming  with  Function)  are  used  to  extract  lengthy 
pieces  of  a  PSPEC,  or  pieces  found  to  occur  in  several  PSPECS. 


Consnatrd  Monitor  2.5.1:  PSPEC 

const  Epsilon  Direction  (ED)  =  1  (x  (degree)  :  allowed 
variance  in  course) 

When  receive  Aircraft  Positions  (AP) 

if  there  is  a  Last  Command  (LC)  [for  LC.Id  =  AP.Id] 
then 

else 

create  AP.Id  in  LC  with  value  null 
endif 
endwhen 


461 


Function  Course_Dif f (anglel,  angle2) 

{Given  two  angles  calc,  the  difference  between  them, 
taking  the  0/360  boundary  into  account) 
angle  :=  abs (anglel  -  angle2) 
if  (angle  >  180)  then 
angle  :«■  (360  -  angle) 
endif 

return  angle 

When  get  Issued  Command  {IC) 
case  IC. Message  Type  of 

H,S,T,W  :  store  Message  in  LC  for  AP.Id 
(less  Message  Id,  plus  time  now) 

P  :  store  null  in  LC  for  AP . Id 

Otherwise  :  {A,C,F,N,0,U,X}  do  nothing 

endcase 
end when 


Architecture:  Design  Decisions.  The  aicbitectuie  model  specifies  how  the  system  is 
to  be  structured;  it  is  the  allocatioa  of  available  means  to  fulfil  requirements.  Tbe 
CONSENSUS  architecture  model  is  built  by  identifyiog  Knowledge  Based  Agents 
within  tbe  requirements  model.  It  is  recorded  by  merely  grouping  process  numbers 
using  set  notation.  For  example,  to  state  that  sub-processes  2.5. 1  and  2.5.5  form  the 
first  Knowledge  Based  Agent  (KBA)  in  tbe  architecture  design  for  tbe  process  2.5, 
one  writes:  KBA2_5.1  =  (2.5.1, 2.5.5). 

An  important  part  of  tbe  design  side  of  tbe  method  is  the  provision  of  guidelines  for 
KBA  identificatioa.  A  number  of  factors  can  influence  tbe  architecture  design 
corresponding  to  a  requirements  model,  because  there  is  some  iixleterminacy  in  the 
notion  of  an  "architecture".  In  an  abstract  sense,  an  architecture  states  bow  a 
requirement  will  be  fulfilled,  while  in  a  concrete  sense  an  architecture  relates 
requirements  to  physical  means  for  meeting  them.  Tbe  notion  of  a  KBA  used  in  tbe 
design  process  is  scmcwhat  indeterminate:  sometimes  a  unit  in  an  abstract 
architecture  for  distributed  problem  solving,  sometimes  a  processor,  or  process 
running  on  a  processor.  There  may  also  be  uncertainty  about  tte  number  of  available 
processors,  or  practical  limit  on  number  of  processes  per  processor.  Tbe 
indeterminacy  has  consequences  when  trying  to  identify  appropriate  KBAs.  Several 
different  criteria  can  be  used.  For  example,  tbe  sharing  of  d^  between  processes,  and 
so  tbe  amount  of  communication  if  tbe  processes  were  separated;  the  balance  of 
funcfionality  between,  or  reasonable  size  of,  processes  (sometimes  based  on  the  size 
of  tbe  PSPEC  in  tbe  requirements  specification);  the  natural  concurreiKy  in  a 
problem;  a  resemblance  to  a  standard  parallel  architecture  (for  example,  pipelining). 
These  criteria  do  not  all  share  tbe  same  level  of  attraction.  For  example, 
communicatioos  concerns  are  concrete,  aixl  may  conflict  with  a  "logical"  architecture 
design  based  on  a  natural  concurrent  decomposition  of  tbe  application. 

In  this  example,  tbe  reasoning  about  KBA  identification  went  as  follows. 

1:  The  best  (and  easiest)  metric  to  use  is  that  of  datastore  usage.  Diagrams  and 
PSPECs  can  indicate  tbe  balance  of  datastore  read  operations  (input  flows)  to 
datastore  write  operations  (output  flows);  tbe  former  are  to  be  considered  more 
expensive. 

2.  loremal  datastore  usage  is  considered  first.  The  group  (2.5.1,  2.5.2,  2.5.3 )  share 
the  Last  Conunand  data  store,  and  (2.5.6, 2.5.7 }  share  Time  in  Sector. 


462 


3;  A  criterioa  of  external  datastore  usage  is  then  employed.  The  group  {2.5.4, 
2.S.6,  2.S.7)  are  linked  by  the  use  of  the  Flight  Plans  datastore  flow.  (The 
Aircraft  Positions  flow  does  not  come  from  a  datastore).  Adding  in  this  metric 
yields  the  groupings:  (2.5.1, 2.5.2, 2.5.3}  arrd  {2.5.4,  2.5.6, 2.5.7). 

4:  This  leaves  the  problem  of  allocating  of  2.5.5;  it  can  either  be  a  separate  KBA  or 
be  amalgamated  with  one  of  the  other  two.  It  is  too  small  to  be  a  KBA,  so  one 
has  to  decide  which  of  the  others  to  add  it  to. 

5:  Looking  at  the  use  of  auxiliary  functions  in  PSPECs,  it  is  found  that  2.5.5  has 
the  use  of  Course.Diff  in  common  with  2.5.1  and  2.5.2.  A  natural  solution  is  to 
put  it  with  these.  There  are  now  two  KBAs  corresponding  to  the  requirements 
model  of  process  2.5:  KBA2_5.1  =  {2.5.1,  2.5.2,  2.5.3,  2.5.5}  and  KBA2_5.2  = 
{2.5.4,  2.5.6,  2.5.7}. 

6:  Confirmation  of  this  decision  is  sought  by  examining  the  size  of  the  two  KBAs, 
based  on  the  total  size  of  the  PSPEC  text  of  the  grouped  processes.  This  reveals 
KBA2_5. 1  to  be  about  four  and  a  half  pages,  with  KB  A2_5.2  about  four  pages. 
It  was  therefore  concluded  that  this  identification  of  KBAs  would  work 
reasonably  well. 

The  place  of  these  two  KBAs  in  the  final  architecture  of  the  ATC  workstation  is 
shown  in  figure  7.  They  appear  as  Monitor-Command  atxl  Monitor-Flight.  (Note  that, 
to  ease  the  drawing  task,  the  same  Data  Base  appears  many  times). 


Fig.  7  Architecture  of  the  ATC  Workstation 


Implementation.  The  architecture  design  detennines  bow  many  Muse  processes  wiU 
be  needed,  since  each  Knowledge  Based  Agent  is  assigned  a  corresponding  Muse 
process.  (Obviously,  the  availability  of  machines  and  memory  capacity  can  make 
certain  decisions  unrealisable).  A  typical  Muse  application  would  exploit  the 
following  elements.  Knowledge  sources  are  scheduled  on  an  agenda  when  their 
Tulesets  are  notified  of  changes  to  objects  in  various  working  memory  areas 
("databases").  Some  of  these  databases  are  associated  with  knowledge  sources  aud 
lulesets,  some  are  stand  alone  ("notice  boards").  Procedure  calls  and  message-passing 
can  supplement  the  work  of  rules,  or  can  be  used  indei^ndently.  A  set  of  data  channel 
objects  capture  external  information  from  processes  linked  to  one  another  by  socket 
connectioos.  Streams  to  aixl  fiom  these  processes  are  objects  to  which  write  atKl  read 
messages  can  be  sent.  Each  Muse  progrsun  is  a  regarded  as  an  objea  that  contains  all 
the  ^plication  structure  in  its  slots  and  methods. 

43  Dynamic  Tactical  Planning 

The  Dynamical  Tactical  Planning  application  (DTP)  was  chosen  as  the  secotxl  driver 
sqtplication  for  a  number  of  reasons.  The  scope  of  the  method  iiKludes  guidance  on 
tte  issues  such  as  cooperation  that  need  to  addressed  in  developing  distributed 
intelligent  systems,  but  the  requirement  for  these  in  the  ATC  system  was  minimal.  In 
order  to  support  the  further  development  of  the  method  aixi  demonstrate  its  versatility, 
therefore,  it  was  important  for  the  second  application  to  have  a  clear  requirement  for 
cooperation  between  agents. 

This  ai^lication  consists  of  two  teams  of  three  autonomous  agents  manoeuvering 
within  a  bounded  hexagonal  grid.  The  agents  have  two  principal  attributes; 
Anununition  available  and  sustai^le  damage. 

The  arena  of  combat  consists  of  a  hexagonal  grid  with  a  home  base  for  each  aircraft  in 
each  of  the  six  comets.  The  goal  of  each  aircraft  is  to  maximise  damage  inflicted  on 
an  enemy  while  minimising  damage  to  itself.  Each  move  cortsists  of  three  phases 
which  in  turn  consist  of  a  possible  movement  aixl  possible  firings.  The  planning 
cycle  for  each  aircraftaddresses  the  next  thoree  actions,  where  all  three  actions  for  each 
aircraft  are  taken  together. 

Although  the  aircraft  agents  were  autonomous,  for  the  most  part  their  outward 
behaviour  was  that  of  a  concerted  team  effort. 

5  Conclusions 

The  objective  of  the  CONSENSUS  project  has  been  the  production  of  a  method  for 
the  development  of  large  real-time  concurrent  knowledge  based  triplications.  The 
scope  of  the  method  encompasses  the  complete  product  life-cycle  including; 
functional  analysis,  the  design  approach,  detailed  design,  imolementation, 
verification,  vah^tion,  testing,  maintenance  and  support. 

The  method  has  been  under  continuous  development,  and  refinement,  throughout  the 
duration  of  the  project,  with  team  members,  ^tributed  over  three  sites,  occupying 
roles  of  both  developers  and  users  of  the  method.  The  requirements  modelling  phase 
of  the  method  was  considered  to  be  the  most  effective  component  of  the  method, 
based  on  proven  techniques  for  large  scale  development. 


464 


The  method  suppoits  architectural  design  through  a  set  of  guidelines  detailing  how 
the  processes  and  data  stores  of  the  requirements  model  should  be  allocated  to  the 
knowledge  based  agents,  the  principal  architectural  components  of  the  design,  while 
taking  into  account  any  system  resource  limitations.  These  guidelines  were  employed 
extensively  and  e^ctively  on  both  demonstrator  applications. 

One  of  the  end  products  of  the  design  process  is  a  set  of  Process  Specificahons 
(PSPECS).  The  PSPECS  represent  the  ^ctionality  of  the  system  aixl  are  usually 
expressed  as  structured  English  or  truth  tables  etc..  Ihe  mapping  of  PSPECS  onto  the 
knowledge  sources  was  performed  as  laid  down  in  the  method  guidelines.  The 
correspondence  between  the  data  flows  of  the  requirements  model  and  the 
architecture  interface  flows  of  the  architectural  design,  together  with  the  neutrality  of 
functional  decomposition  ensures  that  the  PSPECs  transform  naturally  onto  the 
structure  required  for  knowledge  sources  within  an  agent.  The  use  of  ff)rmal 
knowledge  acquisition  tools,  as  part  of  the  requirements  analysis,  is  likely  to  help  this 
process. 

An  effective  ai^roach  to  the  development  of  hierarchical  designs  is  based  on  using 
the  notion  of  an  abstract  high  level  knowledge  based  agent,  which  has  little  or  no 
functionality  of  itself,  but  which  defirres  a  decomposition  into  a  number  of  other 
lower  level  knowledge  based  agents.  This  was  exemplified  by  the  architecture 
selected  for  the  second  application,  where  abstract  knowledge  based  agents  were  used 
to  provide  conceptually  simplified  routing  of  inter-agent  messages. 

Accurate,  architectural  interface  flow  descriptions  were  of  considerable  benefit  to  the 
designers  of  the  demonstrator,  providing  the  required  interface  specification  between 
the  many  processes  used  to  implement  tte  knowledge  based  agents  of  the  application. 
The  flow  descriptions  of  the  architectural  design  provid^  a  base  from  which 
verification  could  be  performed.  In  particular,  the  flow  descriptions  provided  strong 
support  for  the  construction  of  test  harnesses  for  agent  testing. 

The  documentation  standards  for  the  method,  together  with  their  associated  document 
notation,  are  an  important  component  of  the  CONSENSUS  method.  The  benefits  of 
^plying  such  standards  were  demonstrated  by  the  ease  with  which  the  development 
team  were  able  to  generate,  share  and  review  unambiguous  details  of  both 
specification  and  design.  The  quality  of  documentation  was  the  key  to  the  suppon  of 
the  multi-site  development  of  the  demonstrator  applications.  In  particidar  the 
documentation  statxlanls  were  considered  to  be  a  major  ccmtribution  to  the  ease  of 
development  of  the  second  application. 

The  method  calls  for  the  requirements  modelling  and  architectural  design  phases  to  be 
carried  out  as  concurrent  activities.  This  was  reflected  in  the  structure  defined  for  the 
design  level  documents,  which  required  the  parallel  exposition  of  requirement 
modelling  and  architectural  design,  within  the  same  design  document  This  approach 
was  considered  to  provide  exceptional  benefits,  aixl  vindicated  its  adoption  as  part  of 
the  documentation  standards  for  the  method. 

Suj^rt  for  the  design  of  large  scale  systems  is  provided  directly  through  the  process 
of  mnctional  decomposition,  which  was  aj^hed  with  success  to  both  demonstrator 
applications.  The  ability  to  generate  a  hierarchical  architecture  that  has  a 
correspondence  with  the  process  decomposition  of  the  r^uirements  analysis  takes 
support  for  large  scale  systems  even  closer  to  implementation. 

Knowledge  based  agents  are  the  basic  building  blocks  of  the  architectural  design. 
This  design  aj^roach  leads  to  a  design  that  is  free  from  deadlock,  and  ensures 
consistency  within  the  system  while  being  able  to  meet  real-time  constraints,  thereby 
providing  the  necessary  support  for  the  construction  of  continuous  real-time 
^plications. 


465 


The  requiiements  model  helps  identify  where  concurrency  in  an  application  exists, 
and  thereby  enables  concurrent  designs  to  be  created.  However,  the  method  does  not 
directly  support  the  exploitation  of  avail:d>le  concurrent  computing  resources  where 
concurrency  has  not  been  identified  through  the  modeling  of  r^uirements. 

Functional  decomposition  was  found  to  be  good  at  highlighting  where  more 
knowledge  about  tM  system  was  required.  The  method  provides  no  direct  suppon  for 
knowledge  elicitation,  limiting  itself  to  defining  a  blackboard  structure  for  the 
application’s  architectural  elements  in  terms  of  knowledge  based  agents,  where  the 
blackboard  is  a  generic  platform  supporting  pattern-directed  inference,  that  can 
employ  a  number  of  representation  schemes.  The  application  knowledge  is 
represented  in  the  PSPECS  of  the  requirements  model  which  are  transformed  into  the 
behaviour  of  the  knowledge  sources  within  an  agent. 

In  conclusion,  the  CONSENSUS  method  has  been  used  in  the  construction  of  two 
^plications,  atxl  has  been  refined  in  the  light  of  the  experience  that  has  been  gained, 
'nie  project  concluded  at  the  end  of  February  1993  and  the  final  version  of  the  method 
will  be  available  from  the  end  of  March  [I  ]. 

References 

1  M.  Bateman:  The  CONSENSUS  Method  Final  Report  (D17),  available  from  the 
authors 

2  M.  Bell  &  B.  Clark  B;  Research  into  Air  Traffic  Tools(RATT).  Final  Report  No. 
C22779-FR-(X)la,  Cambridge  Consultants  Limited 

3  A.  Bond  &  L.  Gasser:  An  Analysis  of  Problems  and  Research  in  DAI.  In  A. 

Bond  A.  &  L.  Gasser  (Eds.):  Readings  in  Distributed  Artificial  Intelligence. 
Morgan  Kaufmann  1988 

4  Muse:  System  Guide  and  Reference  Guide.  Cambridge  Consultants  Limited  1988 


Script  and  Frame:  Mixed  Natural  Language 
Understanding  System  with  Default  Theory 


Honghua  Gan* 

Department  of  Computer  Science 
University  of  Exeter 
Exeter,  EX4  4PT  UK 
Email  :hga@dcs.exeter. ac.uk 


Abstract 

Minsky’s  frame  [7]  and  Schank’s  script  [lOl  ,.re  two  leading  re|>rpsen- 
tation  languages  in  the  natural  language  understanding  system  for  under¬ 
standing  a  story.  Very  different  inference  mechanisms  are  embedded  in 
the  two  representations:  property  inheritance  along  taxonomic  structure 
in  the  frame  system  vs.  causal-effect  connectivity  among  a  sequence  of 
events  in  the  script.  This  paper  attempts  to  merge  them  into  a  mixed 
understanding  system,  specially  via  default  logic  to  formalize  its  inference 
process  and  create  a  causalized  default  theory.  The  idea  is  to  retain  the 
frame  system  as  a  basic  structure  for  events,  and  then  to  organize  a  script 
specifying  the  normal  way  of  events  happening  in  the  specific  situation  via 
connecting  some  concerned  lower  level  frames  with  causal  relationship 


1  Introduction 

One  of  the  aims  of  Artificial  Intelligence  is  lO  make  computers  understand  hu¬ 
man’s  natural  language.  Much  better  than  the  current  machines,  human  beings 
can  use  the  so-called  common-sense  knowledge  in  their  everyday  reasoning  to 
help  understand  stories,  make  decisions,  and  take  actions.  In  order  to  assign 
similar  abilities  to  computers,  many  formal  or  informal  more  powerful  inference 
mechanisms  have  been  presented  [eg.  8,6;  7,10]. 

In  understanding  a  story,  there  are  two  classical  theories  handily.  For  a  de¬ 
scriptive  story  such  as  a  news  story  [14],  frame  systems  work  well;  for  a  sequence 
of  events  involving  actions,  scripts  seem  to  be  more  suitable  and  efficient  [10]. 
In  both  frame  systems  and  scripts,  there  are  defaults  hidden  in  the  explicit  in¬ 
ference  mechanisms.  Property  inheritance  is  a  main  feature  of  frame  systems. 

*The  author  is  supported  by  Sino-British  Friendship  Scholarship  Scheme  (SBFSS)  and 
Natural  Science  Foundation  of  China 


467 


It  can  be  roughly  specified  that  properties  can  be  shared  by  the  difTerent  frames 
and  normally  can  be  inherited  by  the  lower  level  frames  from  the  higher  level 
frames  along  the  taxonomic  structure,  unless  the  contrary  is  explicitly  filled  in 
the  local  slots.  In  many  situations,  the  property  inheritance  process  is  dr  feasi- 
ble,  and  it  can  be  formalized  by  the  default  theory  [1,2],  On  the  other  hand, 
scripts  enjoy  a  causal-effect  chain  among  the  event  sequence  of  the  story  in  the 
specific  situations.  In  fact,  the  script  specifies  a  normal  way  of  what  luis  taken 
place  based  on  commonly  accepted  experience  and  suppo.ses  these  events  have 
really  happened  unless  the  story  has  told  something  different.  This  normal  sup¬ 
position  of  thing’s  taken  place  is  defeasible,  too,  and  much  information  should 
be  gap-filled  in  the  understanding  process.  An  initial  attempt  aiming  at  for¬ 
malizing  scripts  and  script-based  understanding  in  a  default  theory  based  on 
default  logic  has  been  tried  in  [5). 

In  this  paper,  further  understanding  systems  are  considered.  In  the  basis 
of  investigating  frames  and  scripts-based  understanding  .systems  respectively, 
a  mixed  representing  mechanism  is  proposed  and  an  attempt  to  formalize  the 
mixed  inference  in  a  default  theory  is  provided  subsequently. 

2  Representing  Static  Properties  in  Frames 

Frame  system  is  a  leading  knowledge  representation  language,  as  Minsky  pro¬ 
posed,  it  could  be  compared  to  the  dominated  logic-based  mechanisms.  The 
essence  of  the  theory  is  that  “when  one  encounters  a  new  situation  (or  make 
a  substantial  change  in  one’s  view  of  the  present  problem),  one  selects  from 
memory  a  structure  called  a  frame.”  [7]  The  basic  features  can  be  extracted  as 
follows: 

•  A  frame  is  a  data  structure  for  representing  a  stereotyped  situation,  like 
being  in  a  certain  kind  of  living  room,  or  going  to  a  child's  birthday  party. 

•  For  visual  scene  analysis,  the  different  frames  of  a  system  describe  the 
scene  from  different  viewpoints,  and  the  transformations  between  one 
frame  and  another  represent  the  effects  of  moving  from  place  to  place. 
Different  frames  of  a  system  share  the  same  terminals. 

•  A  frame’s  terminals  are  normally  already  filled  with  “default”  assignments. 
Shared  frames  can  inherit  these  default  assignments  from  the  higher  level 
frames  along  the  taxonomic  structure. 

Therefore,  frames  in  essence  are  suited  to  represent  static  or  descriptive 
properties  from  one  view.  For  example,  a  natural  language  story,  we  suppose, 
contains  a  series  of  isolated  events.  Obviously,  all  descriptive  properties  of  these 
events  are  easy  to  be  represented  in  the  frame  systems.  One  simple  example  is 
to  represent  disaster  events  in  the  event  frame  [14]  shown  in  Fig.  1. 


468 


Figure  1:  Disaster  Events  in  A  Frame  System 


It  is  easy  for  computers  to  understand  such  stories.  When  you  ask  questions 
about  the  information  concerned  with  the  disaster,  the  understanding  system 
can  correctly  give  answers  through  inheritance  inference. 

3  Normal  Sequence  of  Actions  in  Scripts 

In  spite  of  obvious  bias  towards  static  properties,  Minsky  and  some  other  re¬ 
searchers  [7,4]  tried  to  represent  action-type  events  in  their  frame  systems,  which 
were  called  “scenarios  frames”.  But  they  have  not  worked  properly.  The  similar 
attempt  is  Schank’s  script,  which  provides  a  supposed  event’s  process  in  detail 
in  the  specific  situations.  From  the  help  of  scripts,  the  understanding  system 
easily  learns  causal  relations  between  isolated  events  given  by  the  story. 

The  characteristics  of  the  script-based  understanding  systems  can  be  con¬ 
cluded  as  follows; 

•  The  script  is  used  for  representing  a  well-known  common  situation,  which 
is  typically  commonsense  knowledge.  The  script  is  heavily  based  on  past 
commonly-accepted  experience,  reflecting  the  normal  way  of  something 
happening. 


469 


•  There  are  causal  relations  between  the  adjoint  events  in  the  script.  1  he 
understanding  system  gives  a  causal  explanation  for  the  isolated  events  in 
the  story  via  connecting  them  with  events  in  the  script. 

•  There  are  some  default  slots  in  the  script,  which  expect  to  be  filled  after 
the  story  is  stepwise  inputed.  After  filled,  they  can  be  normally  referenced 
identically. 

Much  different  from  frames,  scripts  depend  upon  causal  relationship  [lassing 
the  useful  information  between  events.  Namely,  the  inference  merliani.sji)  in 
the  script-based  understanding  system  is  a  causalized  default  reasoning,  which 
supposes  something  happening  between  ill-connected  events  from  the  story  and 
fills  in  the  gaps  to  make  them  well-connected. 

It  is  easy  to  see  that  the  script  itself  is  defeaisible  knowledge  specifying  the 
normal  way  of  actions.  But  when  something  different  is  explicitly  mentioned  in 
the  real  story  coming  out  stepwise,  the  events  in  the  script  have  to  be  overridden 
by  these  facts. 

In  a  stepwise  story  understanding  system,  we  suppo.se  the  story  consists  of 
a  sequence  of  events  ei,e2,  which  are  ill-connected.  We  must  find  nor¬ 

mally  taking  place  events  from  the  script  to  fill  in  the  gaps  to  make  them  well- 
connected.  Suppose  the  script  contains  the  causally  connected  events  .stj  ,  .st-j,  .  . 
set.  the  understanding  explanation  is  like  this 

ei[=  sei.seo . se<,  =]e2(=  sfi,  ■  •  seij  =] . .  .e„. 

where  the  events  between  pair  ei,e,+i  can  be  null. 

4  Merging  Scripts  and  Frames 

There  are  some  common  points  between  frames  and  .scripts,  one  of  the  most 
distinguished  is  that  there  are  some  slots  in  both  frames  and  scripts  to  repre.scnt 
some  kinds  of  static  properties  of  the  events.  We  retain  frame  structure  as 
a  basic  framework  of  our  mixed  system  so  that  this  knowledge  will  be  only 
specified  in  the  frames.  In  fact,  there  are  two  kinds  of  defaults  in  the  script. 
One  is  object-oriented,  ie.  slots  denoting  actors,  objects  etc.  as  mentioned  above. 
The  other  is  sequence  of  actions  itself,  which  is  hard  to  embed  in  the  frame 
system.  Compared  with  frames,  the  second  kind  of  defaults  in  the  script  specifies 
dynamic  properties  of  event’s  sequence.  Our  task  is  to  create  a  mechanism  to 
represent  such  dynamic  properties  of  actions  in  the  frame  structure.  Namely, 
we  should  capture  scripts  in  the  frame  structure  in  some  way. 

As  Schank  proposed  [9],  actions  in  a  typical  story  can  be  classified  a  small 
type  set,  called  primitive  acts.  Primitive  acts  abstract  action  behaviors  from  the 
detailed  situations  and  can  be  installed  into  a  script  in  the  specific  case.  The 


470 


problem  is  bow  to  represent  these  prifiiitive  acts  and  (heir  subsumed  detailed 
actions  in  the  frame  systems.  VVe  recall  that  the  fratoe  can  be  al.so  used  for 
representing  scenarios.  Now  we  do  not  want  the  frames  to  represent  the  dynamic 
process,  but  representing  static  property  of  action-type  knowledge  Namely, 
primitive  acts  are  represented  a  kind  of  special  frames,  which  actually  are  a  set 
of  actions.  One  action-type  frame  includes  four  slots; 
actor:  ********(eiccutive  of  the  action) 
action:  ********  (primitive  acts  if  possible) 
object:  *******(the  object  of  the  action) 
direction:  *******(from  and/or  to  of  the  action) 

VVe  obtain  a  whole  event  frame  system,  which  inchides  two  categories  of 
frames:  one  is  for  descriptive  events  and  the  other  for  action-type  events.  I'lie 
action-type  events  specify  dynamic  change  of  event’s  sequence.  From  time  to 
time,  some  of  them  are  organized  as  scripts,  with  respect  to  one  specific  de¬ 
scriptive  frame.  So,  in  fact,  we  have  two  relationships  in  the  frame  systems:  oiu' 
is  inheritance  via  is-a  taxonomic  links  and  the  other  causal-effect  relation  via 
causal  connectivities  in  the  scripts.  Given  a  story  step  by  .step,  if  the  event  is 
only  concerned  with  the  static  properties,  it  is  not  necessary  to  invoke  the  script, 
the  inheritance  mechanism  is  enough:  fill  in  the  slots  or  obtain  t  he  default  valiu's 
through  inheritance  paths;  otherwise,  the  events  are  not  well-connected  between 
adjoints,  the  script  must  be  activated  to  help  understand  why  the  events  look 
like  this.  Normally,  one  event  from  the  story  can  get  all  information  from  the 
is-a  relation  and  causal  relation  in  the  script. 

It  should  be  notified  that  the  inference  granularities  in  the  descriptive  frame 
and  in  the  action-type  frames  in  the  script  are  different.  In  the  descriptive 
frame,  the  concerned  slots  are  customers,  waiters,  cooks,  menus,  foods,  cashiers 
etc.  On  the  other  hand,  in  the  script,  what  is  focused  upon  is  about  customer, 
waiter  ete’s  behaviors!  The  customer  John  is  not  important  in  the  whole  story, 
something  important  in  understanding  is  John’s  behaviors  through  the  story: 
enter  the  gate,  find  a  place  to  sit  down,  look  up  the  menu,  order  a  dish,  eat  the 
food,  pay  the  check,  and  exit  the  gate. 

Fortunately,  it  is  easy  to  distinguish  these  two  inference  granularities.  .Nor¬ 
mally,  the  static  properties  which  are  specified  in  the  descriptive  frame  are 
passed  by  inheritance  mechanism  along  the  taxonomic  structure.  Conversely, 
dynamic  properties  which  are  specified  in  the  script  are  supposed  as  really  taken 
place  based  on  causality.  Given  a  stepwise  story,  one  can  fill  the  static  prop¬ 
erties  in  the  slots  of  the  descriptive  frame  and  obtain  causal  explanation  from 
change  of  slots  of  action-type  frames  in  the  script  respectively.  In  the  default 
theory  below,  the  two  default  reasonings  carry  on  in  different  layers,  therefore 
there  is  no  confusion  at  all. 


471 


Figure  2:  Mixed  Representation  for  Restaurant 


5  Formalizing  with  Default  Theory 

We  suppose  that  we  can  create  an  extendable  all-event  frame  system,  which 
contains  events  that  could  take  place  at  any  time.  As  specified  in  the  last 
section,  two  categories  of  frames  exist  in  such  a  system;  descriptive  frames  for 
static  properties  and  action-type  frames  for  change  of  events.  Correspondingly, 
there  are  two  distinguished  inference  mechanisms  in  the  system.  All  descriptive 
frames  take  property  inheritance  along  the  is-a  taxonomic  structure  vertically; 
and  constrained  by  the  descriptive  frame,  the  associated  script  makes  some 
action-type  frames  connected  with  causal  relation  reflecting  change  of  events 
normally  specifying  the  descriptive  frame. 


472 


We  try  to  apply  default  logic  to  merge  two  different  inference  merhanisins 
in  a  uniform  formalism.  Brewka  [1]  gave  an  excellent  semantic  explanation  for 
inheritance  inference  mechanism  in  the  frame  system  with  McCarthy’s  circum¬ 
scription.  Similar  considerations  can  be  found  in  [2],  with  the  other  nonmono¬ 
tonic  reasoning  formalisms.  Reiter’s  default  logic  can  give  a  simply  clear  default 
theory  for  a  typical  inheritance  inference. 

Suppose  one  frame  structure  in  the  frame  system  consists  of  a  frame  name, 
several  slots  and  corresponding  (default)  values  of  the  slots.  The  lower  level 
frames  can  obtain  the  (default)  values  of  slots  from  the  higher  level  frames 
if  there  is  no  explicitly  assignments  given  at  the  lower  level  frames  for  their 
relevant  slots.  Let’s  denote  /,  or  /i,/2,  -  ■  -  to  frames,  s,  or  Sj .  So,  •  •  •  to  slots  of 
a  frame,  and  r,  or  t/i ,  1/2, . . .  to  values  of  the  slots.  The  inheritance  inference  can 
be  formalized  as  follows: 

•  ffold{fi,s,v)  :  isa(f2,fi)  A  Hold(f2,s,v)/Hold(f2,s,v)  {6d,  €  Di) 

Namely,  for  the  frame  /2,  if  there  is  no  (default)  assignment  for  its  slot  s, 
then  according  to  inheritance  mechanism,  along  the  is-a  relation  (ie.  the  fram‘= 
/2  is  a  sub-frame  of  the  frame  /i)  the  value  v  of  the  slot  s  in  the  frame  /o 
persists  the  corresponding  one  as  in  the  higher  frame  /j.  Here  the  predicate 
Hold{f,s,v)  represents  that  there  exists  a  slot  s  with  the  value  v  in  the  frame 
/,  and  the  predicate  tsa(/i,/2)  reads  that  the  frame  /i  is  a  sub-frame  of  the 
frame  /2. 

There  is  something  more  complicated  in  formalizing  the  script.  Clearly, 
inference  in  the  script  consisting  of  action-type  frames  is  not  directly  dealing 
with  the  slots  and  their  values  of  the  activating  descriptive  frame.  The  inference 
granularities  in  the  two  layers  are  quite  different:  one  is  about  the  properties 
of  the  slot’s  values  of  the  activating  frame;  the  other  the  slot  and  value  itself. 
Recall  the  four  own  slots  of  an  action-type  frame:  actor,  action  (primitive  acts), 
object  and  direction,  the  default  theory  must  reflect  actor  and  object's  behaviors 
under  default  action  and  direction,  where  the  actor  and  object  respond  to  the 
slot’s  value  of  the  activating  descriptive  frame.  The  defaults  seem  like  these: 

•  :  causal{fai,  fa2)  A  Hold{fa2,*actor,va)/ Hold(fa2, factor. va)  (bjj  G 
D2) 

•  :  causal[fai,  fa2)  A  Hold{fa2,*action,va)/ Hold{fa2,*(iction,va)  {bp  G 

D2) 

•  :  causal{fai,  fa2)  A  Hold{fa2,*object,va)/ Hold(fa'i,*object,va)  (bp  G 

D2) 

•  :  causal{fa\,  fa2)  A  Hold{fa2,*direction,va)/ Hold(fai,*direction,i'a) 

e  O2) 


473 


Where,  the  predicate  causal(fa\,fa2)  represents  that  there  is  a  causal  re¬ 
lation  from  the  frame  fa\  to  the  frame  /a2,  fai  refers  to  an  action-type  frame. 
The  informal  explanation  for  the  above  defaults  is  that  based  on  the  causal 
chain  in  the  script,  the  behaviors  of  the  four  slots  (responding  to  the  values  of 
the  relevant  slots  of  the  activating  descriptive  frame)  are  the  normal  process  of 
the  event’s  change  unless  the  contrary  is  explicitly  mentioned  in  the  real  story. 

Suppose  the  real  story  is  stepwise  arriving,  {ci,e2.  -  •  -ifn}!  which  are  facts. 
Now  together  with  the  defaults  about  traditional  frame  structure  and  scripts, 
we  have  a  default  theory  A(D,  W),  with 
D  =  DHJD2 
W  =  {ei,e2, .  .  .,e„} 

Suppose  the  fact  event  ej  comes,  the  descriptive  frame  /  is  invoked,  and  all  its 
slots  will  be  filled  in.  Then  the  associated  script  (if  any)  will  be  activated,  which 
tries  to  connect  the  substantial  events  eo,  ■  ■  ■  ,en  as  possible  as  it  could. 

6  Examples 

There  are  some  classical  examples  in  the  understanding  system,  such  as  un¬ 
derstanding  a  news  story  [14],  about  going  into  a  restaurant  [10],  and  about 
shopping  [4].  We  try  to  analyze  two  of  them  with  our  default  theory  for  the 
mixed  understanding  system. 

Example  1  : 

Today  an  extremely  serious  earthquake  of  magnitude  8.5  hit  Lower  Slabovia 
killing  25  people  and  causing  $500,000,000  tn  damage.  The  President  of  Lower 
Slabovia  said  the  hard  hit  area  near  the  Sadie  Hawkins  fault  has  been  a  danger 
zone  for  years. 

There  is  no  script  at  all  for  understanding  this  news  story,  because  all  infor¬ 
mation  is  about  static  properties  of  the  frame  earthquake.  Nothing  is  provided 
about  a  sequence  of  actions  normally  happened  in  the  earthquake.  The  default 
theory  for  that  is  A(D,  W),  with 
W  =  E  (one  event  —  earthquake) 

D  =  Dl  (only  concerns  with  the  descriptive  frames) 

Obviously  the  slots  of  the  frame  earthquake  have  the  following  values: 

place  —  Lower-slabovia; 

time  —  Today; 

forcer  —  elements; 

killed  —  25; 

ingutcd  -  ***; 

property-damage  —  500  millions: 
magnitude  —  8.5; 
fault  —  Sadie  Hawkins. 


474 


Example  2  ; 

John  went  into  a  Chinese  Restaurant,  lie  asked  for  a  dish  of  lobster.  He  paid 
the  check  and  left. 

Here,  the  story  conies  with  “John’s  entering  the  Chinese  Restaurant”,  the 
descriptive  frame  Chinese  —  Restaurant  is  invoked.  The  slots  of  the  frame  are 
filled  in  as  possible  as  it  could,  and  then  the  associated  script  is  activated.  Now. 
the  slot  *customer  =  John*,  and  according  to  causal  relation  in  the  action- 
type  frames  of  the  script  “Restaurant”,  the  default  theory  produces  the  event 
sequence  such  as  /ai,/ao,....  The  second  events  in  the  story  is  arriving,  e-j. 
again,  the  understanding  system  fills  in  the  corresponding  slot  of  the  descriptive 
frame  and  then  matchs  the  fact  with  some  action-type  frames  in  the  script 
(because  if  the  mismatch  happens,  it  actually  means  that  there  is  some  contrary 
explicitly  mentioned  in  the  story,  and  the  default  has  been  blocked). 

Finally,  the  frame  has  got  default  values  about  its  slots: 
customer  —  John; 
waiter  —  ***; 
cook  —  ***; 
menu  —  on  the  table: 
sitting-at-table  —  number  ***; 
food  —  lobster; 
check  —  payable; 
tips  —  *** 
etc. 

Meanwhile,  the  understanding  system  gives  the  causal  explanation  what  has 
happened  about  *John*  eating  in  the  Chinese  Restaurant; 

John  ♦  PTRANS*  John  *  ATTEND*  John  *  M BUILD* 


7  Some  Related  Work  and  Discussion 

Minsky  gave  a  birthday  example  [7],  whicli  probably  was  the  earliest  attempt 
to  do  dynamic  process  of  events  in  the  frame.  But  the  drawback  is  that  there 
is  no  cau.sality  between  the  frames.  Only  depending  upon  the  detailed  frame 
analysis,  there  is  no  way  to  make  it  clear.  Similarly,  Charniak’s  system  [4] 
did  not  give  a  clear  causal  explanation  for  frames  although  he  claimed  that 
his  system  seemed  to  be  no  significant  differences  from  scripts.  Obviously,  the 
very  important  different  inference  mechanisms  between  the  causal  relation  in 
the  script  and  inheritance  in  the  frame  were  ignored. 

Our  mixed  understanding  system  b2ised  on  a  default  theory  should  be  better 
than  both  of  them.  In  practical,  nevertheless,  it  is  difficult  to  organize  action- 
type  frames  to  be  scripts,  and  sometimes  it  is  subtle  to  keep  the  trace  for  the 
descriptive  frame  and  its  action-type  frames  in  its  associated  script,  because 


475 


some  slot’s  values  are  provided  by  the  late  events  in  the  story,  whicdi  may  lead 
to  a  conflict  with  the  script. 

References 

1.  G.  Brewlca:  The  logic  of  Inheritance  in  Frame  systems.  In:  Proceedings  of 

IJCAf-87,  V'ol  1,  pp.  183-488,  Milan,  Italy,  1987 

2.  G.  Brewka:  Nonmonotonic  Reasoning:  Logical  Foundations  of  Common- 

sense.  Cambridge  University  Press  1991 

3.  A.  W.  Burks;  Chance,  Cause,  Reason;  An  Inquiry  into  the  Nature  of  Scien¬ 

tific  Evidence.  The  University  of  Chicago  I'ress  1977 

4.  E.  Charniak:  Inference  and  Knowledge  Part  I,  11.  In:  E  Charniak  and  Y. 

Wilks  (eds.):  Computational  Semantics:  an  introduction  to  artificial  in¬ 
telligence  and  natural  language  comprehension.  Oxford:  North  Holland. 
1976 

5.  H.  Gan:  Formalizing  Scripts  with  Default  Theory.  Technical  Report  248. 

University  of  Exeter,  Department  of  Computer  Science,  UK,  1992 

6.  J.  McCarthy;  Circumscription  —  A  Form  of  Nonmonotonic  Reasoning,  Ar¬ 

tificial  Intelligence  13:27-39  (1980) 

7.  M.  Minsky:  A  Framework  for  Representing  Knowledge.  In.  R.  J.  Brachman 

and  H,  J.  Levesque  (eds.);  Readings  in  Knowledge  Representation,  pp 
245-262,  Morgan  Kaufmann,  1985 

8.  R.  Reiter:  A  Logic  for  Default  Recisoning.  Artificial  Intelligence  13:81-132 

(1980) 

9.  R.  C.  Schank:  Conceptual  Information  Processing.  North-llolland  Publish¬ 

ing  Company  1975 

10.  R.  C.  Schank  and  R.  P.  Abelson:  Scripts,  Plans,  Goals  and  Understanding: 

An  Inquiry  into  Human  Knowledge  Structures.  The  Artificial  Intelligence 
Series.  Lawrence  Erlbaum  Associates  Publishers  1977 

11.  R.  C.  Schank  and  C.  K.  Riesbeck:  Inside  Computer  Understanding:  Five 

Programs  Plus  Miniatures.  Lawrence  Erlbaum  A,ssociates,  Inc.  1981 

12.  R.  C.  Schank;  Reading  and  Understanding:  Teaching  from  the  Perspective 

of  Artificial  Intelligence.  Lawrence  Erlbaum  Associates,  Inc.  1982 

13.  R.  C.  Schank:  Dynamic  Memory:  a  theory  of  reminding  and  learning  in 

computers  and  people.  Cambridge  University  Press  1982 

14.  P.  H.  Winston:  Artificial  Intelligence.  Addison- Wesley,  2nd  edition,  1984 


Constructive  ‘Matching  Methodology:  JormalCy  Creative  or 
IntelCigent  Inductive  Theorem  Proving? 

Marta  Frartova,  Y ves  Kodratoff,  Martine  Gross 

CNRS  &  Universite  Paris  Sud.  LRl,  Bat.  490,  91405  Orsay,  France 

Abstract.  In  this  paper  we  explain  why,  and  in  what  sense,  the 
methodology  for  inductive  theorem  proving  (FTP)  we  develop  is  creative  and 
we  explain  why  our  methodology  cannot  be  said  to  be  "intelligent",  as  a 
human  could  be,  and  nevertheless  it  is  suitable  for  a  user-independent 
automatization  of  FTP. 

Introduction 

We  have  been  developing  for  almost  a  decade  a  methodology,  called  Constructive 
'Matching  (CM),  to  automatize  inductive  theorem  proving  (ITP)  suitable  also  for 
automatic  construction  of  programs  (ACP).  The  goal  of  this  paper  is  to  stress  upon  a 
non-usual  character  of  our  methodology,  we  shall  call  formal  creativity  (FC)  and 
describe  it  later,  as  well  as  to  explain  its  origin  in  our  methodology.  The  reader  will 
understand  that  FC  expresses  a  behavior  of  an  automated  system  that  is  similar  to  a 
standard  routine  work  of  a  mathematician.  Let  us  recall  how  it  looks  like. 

Suppose  that  a  mathematician  is  proving  a  theorem  called  THEOREM.  During  his 
proof  he  realizes  that  in  order  to  perform  an  operation,  say  OPERATION,  he  needs  to 
prove  a  lemma,  say  LEMMA.  He  starts  to  prove  this  lemma.  At  some  step,  he  realizes 
that  LEMMA  is  not  true  in  general,  but  only  in  particular  cases.  He  realizes  that,  in 
fact,  he  can  use  these  particular  cases  to  generate  a  new  lemma  LEMMA2,  which 
allows  for  the  performance  of  OPERATION  as  well.  Thus,  the  mathematician  knows 
how  to  analyze  his  failure  for  LEMMA.  He  knows  that  this  failure  is  not  significant, 
because  he  succeeded  in  finding  another  solution  for  performing  the  OPERATION. 

We  speak  here  of  a  standard  routine  work,  since  the  most  interesting  proofs  found 
by  a  mathematician  are  based  on  a  clever  trick,  or  on  a  suitable  combination  of 
previously  proven  lemmas  seemingly  having  nothing  to  do  with  the  given  theorem. 
We  shall  mention  also  systems  that  behave  in  this  clever  way. 

As  a  behavior,  FC  concerns  also  user-independence  of  automated  systems,  and  thus 
we  shall  say  few  words  also  on  this  topic.  Finally,  we  shall  specify  the  place  of  CM 
in  the  automatization  of  ITP  and  ACP.  Later  we  shall  mention  some  technical  papers 
describing  the  C^-methods  and  techniques,  as  well  as  the  differences  of  our  methods 
and  techniques  with  the  methods  and  techniques  of  other  approaches  in  ITP  and  ACP. 
In  this  paper  we  choose  not  to  present  technical  details  of  CM  so  that  it  can  be 
understood  by  anyone  with  a  slight  experience  in  mathematics  or  logic. 

The  paper  has  the  following  structure.  Section  1  recalls  the  goal  of  ITP  and  links  it 
to  ACP.  Section  2  introduces  the  notion  of  formal  creativity  in  the  automatization  of 
ITP  as  an  antonym  to  human  creativity,  and  links  the  mentioned  types  of  creativity 
and  types  of  systems  with  regard  to  their  dependency  on  a  user.  Section  3  describes 
how  we  characterize  existing  approaches  to  ITP  and  ACP  with  respect  to  these  two 
types  of  creativity.  In  section  4  we  explain  the  origin  of  FC  in  our  methodology 
examining  slightly  a  possibility  to  transfer  our  FC  to  other  existing  approaches.  We 
mention  also  the  present  implementation  of  CM,  report  some  interesting  results,  and 
describe  topics  for  future  research. 


477 


1  Inductive  Theorem  Proving  and  Programs  Construction 

A  formal  proof  of  a  theorem  T  in  a  formal  theory  X  (theory  for  short)  is  an  ordered 
sequence  S  =  (  A,,  Aj. ,  A„  ,  T  |  of  formulae  such  that  each  formula  is  either  an 
instance  of  an  axiom  of  T  or  the  result  obtained  from  the  preceding  formulae  by  using 
an  inference  rule  of  T.  The  problem  of  theorem  proving  consists  of  finding  a  formal 
proof  in  X  for  a  given  theorem  T. 

In  inductive  theorem  proving,  we  deal  with  universally  quantified  theorems,  i.e., 
formulae  of  the  form  "  For  any  object,  say  x,  the  property  F  holds".  This  is  usually 
expressed  symbolically  as  Vx  F(x),  For  these  theorems  we  use  a  special  inference 
rule,  called  the  induction  principle,  in  order  to  discard  the  universal  quantifier 
whenever  our  objects,  represented  here  by  the  variable  x,  belong  to  a  well-founded 
domain.  For  simplicity,  in  this  paper  we  restrict  ourselves  only  to  nonnegative 
integers,  which  are  familiar  to  any  reader.  Of  course,  there  are  other  well- 
founded  domains  to  which  the  induction  principle  can  be  applied,  and  such  domains 
are  treated  in  our  technical  papers.  In  the  induction  principle  is  represented  by 

the  scheme  F(0),  F(n)  =>  F(n-»-l)  I-  Vx  F(x).  This  induction  scheme  says  that 
in  order  to  prove  the  formula  Vx  F(x)  for  natural  numbers,  it  is  sufficient  to  prove  the 
formulae  F(0)  and  F(n)  =>  F(n-i-l)  for  an  arbitrary  natural  number  n. 

The  induction  principle  changes  somewhat  the  notion  of  a  formal  proof  of  a 
theorem  T.  Consider  a  theorem  Vx  F(x),  where  x  belongs  to  nonnegative  integers  and 
F  is  a  property  we  want  to  prove.  A  formal  inductive  proof  consists  of  two  ordered 

sequences  Sj=(C|,  Cj . C„  ,  F(0)),  82=  (B,.  Bj, ... ,  B^,  F(n+1))  of  formulae. 

Each  formula  is  an  instance  of  an  axiom  of  T,  or  (in  Sj)  an  instance  of  the  formula 
F(n)  called  the  induction  hypothesis,  or  the  result  obtained  from  the  preceding 
formulae  by  using  an  inference  rule  of  X.  We  shall  call  inductive  decomposition  the 
transformation  of  the  set  {axioms,  Vx  F(x)  )  characterizing  the  problem  to  prove  Vx 
F(x)  from  a  set  of  axioms,  into  S’,  =(axioms,  F(0)1,  called  the  base  step,  and  Sj  = 
(axioms,  F(n),  F(n-i-l)),  called  the  induction  step.  These  sets  express  the  task  to  prove 
the  last  formula  of  the  set  from  the  previous  ones. 

The  goal  of  ITP  is  to  (i)  perform  the  induction  decomposition,  i.e.,  specify 
correctly  the  last  formulae  of  sequences  S',  and  S2,  as  well  as  the  form  of  the  induction 
hypothesis.  This  amounts  to  generating  a  correct  induction  scheme  for  a  given  domain 
and  a  theorem.  The  induction  decomposition  for  general  well-founded  domains  is  not 
so  trivial  as  it  could  seem  from  the  above  decomposition  in  (ii)  from  S',  and 

Sjprovide  corresponding  sequences  S,  and  S2  for  a  given  theorem. 

The  first  task  differentiates  ITP  from  classical  theorem  proving.  While  proving 
theorems  by  induction  has  been  long  recognized  as  a  useful  way  of  solving  some 
mathematical  problems  [29],  the  interest  in  mechanizing  ITP  became  strongly  evident 
when  people  realized  its  applicability  to  programs  verification  and  to  consVuction  of 
programs  from  formal  specifications.  Let  us  recall  first  how  ITP  applies  to  program 
construction  in  case  when  the  desired  program  f  is  specified  formally  by  something 
similar  to:  for  any  given  input  x  verifying  the  input  condition  P,  the  output  z  has  to 
be  such  that  the  input-output  relation  Q(x,z)  holds.  An  interesting  feature  of  ITP  is 
that  if  we  are  able  to  prove  by  induction  a  theorem  of  the  form 

Vx  (  P(x)=»3zQ(x,z)  ).  (ST) 

called  usually  a  specification  theorem,  then  a  by-product  of  such  a  proof  is  a  program 
f  verifying  the  formula 


V  X  [  P(x)  =>  Q(x,f(x))  }. 


(VFP) 


478 


The  task  to  construct  programs  specified  by  (ST)  is  called  Program  Synthesis  (PS). 
The  formula  (VFP)  allows  now  to  understand  the  role  of  ITP  for  program  verification. 
Once  we  have  a  program  f  and  a  formal  specification,  by  ITP  we  can  check  if  our 
program  verifies  the  given  specification. 

By  this  we  have  completed  a  rough  description  of  the  goal  and  use  of  ITP.  Let  us  say 
few  words  on  the  "art"  to  prove  theorems  by  induction.  Mathematicians,  logicians  and 
many  computer  scientists  usually  do  know  how  to  prove  a  theorem  by  induction 
when  they  need  it.  This  is  quite  interesting  since,  to  our  best  knowledge,  they  do  not 
learn  at  school  how  to  prove  theorems  by  induction.  Of  course,  they  have  seen  in 
school  some  inductive  proofs,  but  they  did  not  have  a  course  presenting  mechanisms 
for  proving  theorems  by  induction  allowing,  just  on  the  basis  of  these  mechanisms 
and  not  depending  on  acquired  experience,  to  prove  any  theorem  that  professor  himself 
is  able  to  prove.  In  other  words,  presently,  people  learn  how  to  prove  theorems  by 
induction  only  through  experience.  This  indicates  that  there  may  be  .some  difficulties 
in  automating  ITP,  i.e.,  in  implementing  a  system  able  to  prove  theorems  by 
induction.  In  the  next  section  we  shall  be  interested  in  the  behavior  of  systems 
implementing  ITP. 

2  Formal  Versus  Human  Creativity 

2.1  Formal  Cre'itivity  in  the  Automatization  of  ITP 

The  notion  of  formal  creativity  may  be  pictured  as  a  behavior  of  an  ordinary 
(concerning  the  intelligence),  but  diligent  (concerning  the  work)  student.  Such  a 
student  is  able  to  prove  only  theorems  for  proofs  of  which  "routines"  are  learned  at 
school.  In  order  to  prove  a  theorem,  he  just  follows  these  routines.  In  order  to  be 
formally  creative,  an  ITP  system  must  implement  these  routines.  In  other  words, 
formal  creativity  in  ITP  comprises  the  ability  to  provide  correct  solutions  (i.e.,  as 
specified  in  section  1,  induction  decomposition  plus  sequences  S,  and  83).  It  also 
includes  the  ability  to  understand  and  analyze  failures,  as  well  as  the  ability  to  propose 
new  solutions  for  recovery  based  on  such  analysis.  This  reminds  what  we  have 
described,  in  inU'oduction,  as  a  standard,  routine  work  of  a  mathematician. 

We  have  mentioned  above  that  people  learn  how  to  prove  theorems  by  induction 
through  experience.  This  hints  somewhat  at  the  absence  of  an  ITP-methodology 
allowing  to  prove  non-trivial  theorems  without  understanding  the  semantic  behind 
these  theorems.  In  other  words,  to  learn  how  to  prove  theorems  by  induction  means 
that  one  is  able  to  treat  the  functions  and  predicates  occurring  in  a  theorem  to  prove  as 
symbols,  and  an  ITP  methodology  describes  proccdurally  how  to  manipulate  these 
symbols  -  and  the  definitions  corresponding  to  these  symbols  -  so  that  a  formal 
proof  of  the  theorem  is  obtained.  The  reader  may  realize  or  understand  better  the 
difficulty  of  syntactical  proof  manipulations  by  preparing  the  following  exercise. 

1.-  take  a  non-trivial  theorem,  for  instance  Theorem  18  from  [30],  2.-  use  abstract 
function  and  predicate  names,  such  as  f,,  (2, ...  and  P,,  P2, ...  instead  of  those  used  by 
Skolem,  3.-  define  fp  f2,  ...  and  P,,  P2,  ...  as  Skolein  does,  and  then  simply  prove 
the  resulting  formula  by  induction  —  without  knowing  about  what  this  theorem 
speaks  of,  without  knowing  what  fj,  f2, ...  and  P,,  P2, ...  mean.  In  case  of  a  success, 
if  still  not  convinced  of  difficulties  of  the  automatization  of  ITP,  the  reader  should  try 
then  to  prove  the  prime  factorization  theorem,  i.e.,  the  problem  of  finding  a 
decomposition  of  a  given  natural  number  into  its  prime  factors,  using  axioms  from 


479 


[221.  In  case  of  a  success,  ihe  reader  may  compare  boih  proofs.  We  claim  ihal  ihcre 
will  be  liule  similarity  between  them  or  even  between  parts  of  them. 

2.2  Human  Creativity  in  the  Automatization  of  ITP 

Human  creativity,  concerning  ITP,  may  be  pictured  as  a  behavior  of  a  very  bright, 
almost  ingenious  student.  Such  a  student  is  able  to  provide  almost  unexpected  and 
highly  interesting  proofs  showing  he  masters  not  only  ITP,  as  a  formal  technique,  but 
also  the  domain  in  which  he  proves  theorems.  An  implementation  of  a  human 
creative  system  is  able  do  the  same  as  this  ingenious  student. 

Obviously,  a  human  creative  (HC)  system  is  more  appealing  than  a  formally  creative 
(FC)  system.  Therefore,  if  we  say  that  our  methodology  tends  to  be  formally  creative, 
while  other  existing  approaches  are  nearer  to  human  creative  systems,  the  reader  may 
just  want  to  stop  to  read  now.  Well,  there  is  a  small  drawback  in  human  creativity  of 
existing  approaches.  We  shall  tell  more  about  it  in  section  3  and  Conclu.«ion. 
Presently,  let  us  mention  only  that  this  drawback  manifests  itself  by  the  fact  that 
until  now  no  one  has  proposed  an  ITP  system  behaving  always,  when  it  comes  to  the 
results,  as  an  ingenious  mathematician.  It  will  become  clear  later  that,  presently,  ITP 
systems  providing  interesting  proofs  require  in  some  way  a  guidance  of  a  u>:cr. 
Therefore,  we  allow  ourselves  to  say  that  an  automated  system  depends  on  human 
creativity,  if  the  user  has  to  guide,  in  some  sense,  the  system, 

2.3  Some  More  Evidences  on  User-independent  Systems 

It  is  obvious  that  if  a  system  is  really  automated,  a  user  does  not  need  to  know  how 
this  system  works.  It  is  also  obvious  that  an  automated  system  has  to  be  able  to 
return  the  output  expected  by  the  user.  The  problem  of  developing  a  user-independent 
system  providing  interesting  proofs  is  a  problem  of  implementing  creativity,  which, 
as  we  all  know  now,  is  far  of  being  achieved  easily.  However,  let  us  suppose  that  we 
decompose  the  wanted  task  to  two  tasks:  1 .  Develop  first  a  completely  automated 
system  able  to  prove  theorems  by  induction.  (We  do  not  forget  here  the  undccidability 
of  theorem  proving.  We  speak  here  of  a  system  which  will  do  "its  best".)  This  is  the 
task  of  implementing  "formal  creativity",  or,  in  other  words,  finding  a  systematic  way 
to  prove  theorems  by  induction.  If  this  task  is  achieved,  then  2.  Study  how 
creativity  can  be  introduced  into  this  system. 

We  believe  that  a  solution  to  the  first  task  may  be  an  FC  system  providing 
standard  outputs.  This  means  that  an  FC  system  will  provide  solutions  also  to 
problems  that  are  not  kno  vn  by  the  user.  Standard  outputs  mean  that  if  the  user 
knows  a  solution,  an  FC  system  will  generally  not  provide  the  solution  identical  to 
that  of  the  user  [11]. 

Who  may  now  be  interested  in  an  FC  system?  Students  might.  Interesting  proofs 
are  not  always  necessary  for  students.  A  professor  may  judge  that  his  student  is  not 
very  clever,  since  he  has  not  discovered  a  tricky  proof.  He  cannot  deny  that  the  student 
has  proved  the  given  theorem.  Therefore,  if  students  could  learn  a  systematic  way  of 
proving  theorems  by  induction,  they  would  have  more  time  to  focus  on  other 
important  things.  For  the  same  reason  an  FC  system  could  be  of  some  use  for 
computer  scientists. 


480 


3  Representatives  of  Research  on  Automatization  of  ITP 

In  section  2.1  wc  have  given  the  reader  a  possibility  to  check  by  his  own  cfforLs  that 
the  automatization,  user  dependent  or  not,  of  ITP  is  not  a  simple  problem.  Wc  have 
also  spent  a  considerable  time  on  a  characterization  of  FC  and  HC  systems,  and  .e 
have  mentioned  that  while  the  goal  of  our  approach  is  to  obtain  an  FC  system,  other 
existing  approaches  rather  lead  towards  HC  systems.  However,  wc  have  not  named 
these  "other "  approaches  in  ITP,  and  thus  the  reader  might  feel  that  our  di'  ision  of  the 
existing  approaches  into  ours  and  others  comes  simply  from  the  fact  that  wc  do  not 
know  sufficiently  existing  work  on  automat  zation  of  ITP.  Here  arc  .some  hints  at  the 
reasons  of  insisting  upon  our  division. 

Let  us  start  first  with  a  natural  division  of  ITP,  such  as  it  actually  is  accepted  in 
Computer  Science.  To  explain  the  origin  of  this  division,  as  wc  sec  it,  we  have  U) 
recall  first  that  specification  theorems  contain  existential  quantifiers.  The  difficulty 
with  this  kind  of  theorems  lies  in  the  fact  that  PS  requires  constructive  proofs  for  ST 
while  for  purely  universally  quantised  theorems  there  is  no  need  for  a  proof  to  be 
constructive.  Automatization  of  proving  theorems  containing  universal  quantifiers 
only  is  already  a  difficult  problem.  The  requirement  of  con.structive  proofs  for 
specification  theorems  seems  to  make  the  problem  even  more  difficult.  Therefore,  u  is 
not  surprising  that  investigators  divided  their  effort  in  building  inductive  theorem 
provers. 

On  one  side,  purely  universally  quantified  theorems  arc  treated  by  what  is  now 
called  automated  ITP.  Let  us  mention  here  only  the  representatives  of  different 
approaches  in  this  stream  (In  (20)  wc  give  the  most  detailed  description  of  existing 
approaches).  [3]  is,  undeniably,  the  representative  of  all  the  work  considered  as  an 
improvement  or  extension  of  the  approach  developed  by  Boyer  and  Moore.  Wc  choose 
(6)  to  repre.sent  building  constructivist  or  intuitionistic  formalisms  suitable  for 
representing  inductive  proofs,  and  (25)  is  for  us  a  representative  of  the  approach  using 
rewrite  systems  (or  induction-less  approach,  as  it  is  sometime"  called)  to  prove 
theorems  by  induction. 

On  the  other  hand,  specification  theorems  are  treated  by  program  synthesis  from 
formal  specifications.  Manna  and  Waldinger  (26j  performed  the  pioneering  work  in 
PS.  They  are  usually  considered  as  representatives  of  deductive  approach  to  ACP,  i.c., 
the  approach  that  constructs  inductive  proofs  for  specification  theorems.  1 1 1  ]  presents 
representatives  of  particular  approaches  in  this  deductive  one. 

Thus,  this  classical  division  in  two  streams,  with  respect  to  the  difficulty  of 
automating  ITP  is  not  surprising.  What,  from  a  naive  poin*  of  view,  is  surprising,  is 
that  the  division  is  so  strong  that,  as  one  could  conclude  from  literature  if  paying 
attention  to  it,  the  two  streams  exist  separately  in  the  sense  that  people  working  on 
universally  quantified  formulas  develop  methods  that  lead  to  subproblems  that  are 
purely  quantified  only,  while  PS  approaches  limit  mostly  themselves  to  generating 
subproblems  that  arc  of  the  form  ST.  In  consequence,  the  mechanisms  developed  in 
these  two  streams  have  a  common  feature:  they  are  not  very  suitable  for  theorems  in 
the  other  stream,  even  if  there  are  attempts  to  use  some  common  computational 
techniques  in  one  or  other  streams  (see  more  in  ( 1 1  ]). 

With  respect  to  this  classical  division,  our  approach  has  its  origin  (18]  in  the 
second  stream,  and  falls  under  the  class  represented  by  (26).  However,  very  soon  it 
became  clear  [19]  that  focusing  on  proving  specification  theorems  rather  then 
synthesizing  programs  makes  our  approach  to  fall  under  the  scope  ol  the  first  stream 
as  well.  The  previous  sections  allow  us  to  name  the  main  feature  of  our  approach: 


formal  creativity  We  may  now  say  that  our  work  is  a  reprcscniaiivc  of  the  research  on 
formal  creative  implementation  of  ITP. 

4  Constructive  ^{atcfiing:  A  JotiticdCy  Creative  Methodology'  for  TTP 

4.1  Goal 

The  goal  of  CiW  is  to  capture  conceptually  problems  in  ITP  and  provide  proceduial 
solutions  to  these  problems.  More  exactly,  the  goal  of  is  to  determine  what  are 
the  problems  innate  to  ITP,  what  arc  the  problems  linked  to  a  user-independent 
implementation  of  ITP  and  how  to  solve  these  two  types  of  problems  so  that  the  final 
implementation  of  our  approach  provides  a  user-independent  system.  We  have 
previously  explained  that  this  goal  means  nothing  but  the  goal  to  develop  a 
methodology  for  ITP. 

4.2  Restrictions 

In  the  extended  version  of  this  paper  [  1 1 )  we  present  the  basic  restrictions,  and  in  1 10) 
more  details  are  given.  Let  us  assure  the  reader  that  the  restrictions  of  C^'.Ao  not  make 
trivial  the  field  of  its  applicability  (as  it  is  illustrated  in  section  4.5),  and  no  ITP 
methodology  has  been  developed  for  this  particular  field. 

4.3  Formal  Creativity  of  -  Where  it  Comes  From? 

To  answer  "Why  CM  is  formally  creative?"  means  that  we  explain  why  CM  provides 
an  answer  to  two  problems,  presently  recognized  as  major  ones,  concerning  the 
automatization  of  ITP  (cf.  The  call  for  AA.\I-93-workshop  on  ITP):  (1)  How  to 
generate  suitable  induction  hypotheses'  schemata?,  (2)  How  to  generate  missing 
lemmas? 

In  CM  these  two  problems  are  intertwined  in  the  sense  that  our  answer  to  ihc 
second  question  gives  us  the  possibility  to  give  almost  a  trivial  solution  to  the  first 
one.  The  solution  to  the  first  problem  is  based  simply  on  the  particular  form  of  the 
structural  induction  principle  we  use  (.see  [191  or  124|).  When  none  of  the  generated  in 
advance  induction  schemes  applies  for  a  particular  theorem,  the  answer  to  (2)  helps  to 
find  mi.ssing  schemes. 

If  our  solution  for  (1)  is  so  simple,  one  may  wonder  where  are  the  difficulties  for 
other  approaches  in  accepting  our  forms  of  induction  hypotheses  Because  of  the 
classical  division  between  ITP  and  PS  approaches,  we  describe  apart  the  main 
difficulties.  Roughly  speaking,  ITP  approaches  cannot  accept  our  solution  to  (1) 
because  induction  schemes  generated  in  CM  may  contain  universal  quantifiers.  The 
universal  quantifiers  in  induction  hypothe.ses  may  lead  to  a  subproblem  containing  an 
existential  quantifier.  These,  as  mentioned  above,  are  out  of  scope  of  ITP  approaches. 
Let  us  suppose  now  that  ITP  approaches  will  decide  to  accept  these  universally 
quantified  hypotheses  accepting  to  use  PS  approaches  when  they  meet  such  an 
existentially  quantified  subproblem.  Unfortunately,  this  alone  will  not  be  sufficient  to 
give  ITP  approaches  a  complete  answer  to  (1),  since,  as  we  have  mentioned  already, 
the  set  of  the  CM  induction  schemes  may  be  not  complete.  Therefore,  ITP  approaches 
would  need  to  complete  in  some  way  the  schemes  proposed  by  CM.  We  suspect 
highly  that  their  solution  will  necessarily  be  similar  to  that  of  CM,  which  relics  on 
linking  this  problem  to  (2),  in  particular,  to  the  problem  of  proving  implications, 
i.e.,  formulae  of  the  form  A  =>  B  (.see  more  in  [11)).  In  other  words,  we  think  that  as 


482 


soon  as  ITP  approaches  will  extend  their  scope  to  existential  quantifiers  and  as  stxin 
as  they  will  be  able  to  prove  implications,  they  will  have  thei'  solution  to  (1). 

Now,  PS  approaches  may  have  objections  towards  our  solution  for  (1),  simply 
because  our  solution  does  not  guarantee  that  programs  -  synthesized  while  proving  a 
specification  theorem  and  using  (3f-induction  hypotheses  -  will  be  as  efficient  as  one 
programmer  would  be  able  to  obtain.  Once  a  standard  proof  leading  to  a  standard 
program  is  obtained,  can  attempt  to  find  a  better  proof  in  the  way  described  in 
[15].  However,  for  lack  of  time  and  financial  means,  "better  proofs"  are  out  of  scope 
of  our  present  research,  simply  because  we  want  first  to  complete  our  research  on 
"standard  proofs".  Nevertheless,  as  it  has  already  been  mentioned,  proofs  leading  to  the 
most  efficient  programs  are  presendy  matter  of  human  creativity. 

Concerning  CM  solution  to  (2)  we  attribute  our  present  success  to  the  mechanism, 
the  so-called  Of-formula  construction  {CMCiot  short),  we  have  developed  for  proving 
atomic  formulae  [19].  Of  course,  CM  applies  not  only  to  proving  atomic  formulae, 
however,  as  we  see  it  now,  our  "luck"  started  by  CMC- 

To  understand  why  a  mechanism  for  proving  atomic  formulae  is  fundamental  for 
the  automatization  of  ITP  would  mean  to  go  back  to  logical  connectives,  de  Morgan 
laws  (see  [20]),  finally  realizing  that  once  we  know  how  to  prove  atomic  formulas  and 
certain  implications,  we  have  almost  solved  the  problem  (see  also  [16]).  Let  us 
suppose  now  that  the  reader  admits  our  stress  upon  atomic  formulae,  and  let  us 
explain  where  our  mechanism  differs  from  mechanisms  used  in  other  ITP  and  PS 
approaches.  In  particular,  we  shall  reduce  our  description  to  ITP  approaches,  as  PS 
approaches  usually  cither  do  not  have  a  unified  mechanism  to  prove  atomic  formulas, 
or  these  mechanisms  arc  in  some  sense  -  concerning  the  explanatory  character 
mentioned  below  -  similar  to  ITP  approaches.  The  basic  difference  of  our  approach 
with  other  ITP  approaches  is  that  CM  has  a  unique  mechanism  for  formulae 
containing  purely  universally  quantified  variables,  as  well  as  for  formulae  containing 
existentially  quantified  variables.  In  fact,  we  treat  formulae  not  containing 
existentially  quantified  variables  as  if  they  contained  some  existentially  quantified 
variables  (see  the  notion  of  an  abstract  argument  in  the  CMC  [20]).  Moreover,  the 
CMC  is  a  procedure,  and  so  at  each  step  it  knows  what  has  to  be  done  and  what  has  to 
be  obtained.  This  is  important  for  lemmas'  discovery  in  two  ways. 

First,  in  case  of  a  failure,  CMC  provides  explanations  suitable  for  further  analysis, 
and  a  part  of  missing  lemmas  generated  in  CM  arc  the  result  of  such  an  analysis.  In 
other  words,  the  second  difference  of  CM  to  other  approaches  is  that  our  basic 
mechanism  provides  suitable  explanations.  Here,  the  reader  may  argue  that  some  of 
ITP  approaches  can  be  said  to  provide  explanations.  For  instance,  a  critical  pair  in 
rewrite  systems  may  be  considered  as  an  explanation.  However,  very  often  only  an 
expert  in  rewrite  sy.stcms  is  able  to  capture  the  essence  of  a  critical  pair  expressing  a 
failure,  and  thus  the  automatiz.ation  of  recovery  process  requires  not  only  an  expert  in 
ITP,  but  in  rewrite  systems  as  well,  while  CMC  provides  explanations  that  allowed 
us,  as  developers,  to  formalize  recovery  process  from  failures  due  to  CMC.  Moreover, 
this  formalization  docs  not  rely  on  understanding  problems  in  rewrite  systems,  but  in 
ITP  only. 

Second,  the  tools  used  for  CMC  arc  finite,  non-lrivial  procedures  that  may  be 
incomplete  theoretically  speaking.  Thus,  CMC  may  fail  due  to  incompleteness  of 
these  tools.  This  indicates  another  way  of  generating  missing  lemmas  in  CM.  Such 
lemmas  arc  here  to  complete  solutions.  Let  us  note  that,  to  this  purpose,  CM  uses 


483 


also  inductive  tools  similai  to  those  developed  in  Machine  Learning.  A  user- 
independence  of  these  inductive  tools  is  assured  by  a  deduction-induction  cycle, 
described  in  [10].  We  may  say  that  the  use  of  non-usual  inductive  tools  to  automati/c 
UP  makes  one  more  difference  of  our  approach  with  respect  to  other  ones. 

Thus,  we  have  described  above  two  ways  of  generating  lemmas  in  C^.  The  first 
one  corresponds  to  recovering  from  the  failures  specific  to  CMC.  The  second  one 
corresponds  to  recovering  from  the  failures  specific  to  the  tools  used  by  CMC.  Failure 
analysis  [9]  takes  care  of  failures  that  are  assumed  to  be  a  consequence  of 
incompleteness  of  our  basic  CW^construction  techniques.  Various  tools  and  techniques 
(inductive,  as  well  as  deductive)  are  proposed  for  recovery. 

There  are  two  other  ways  of  generating  lemmas. 

Since  CM  treats  not  only  atomic  formulae,  the  third  natural  way  to  generate 
lemmas  is  to  analyze  the  problems  that  may  arise  while  proving  by  induction  other 
forms  of  formulae,  and  as  we  have  mentioned  above,  it  is  important  to  know  to  deal 
with  implications  [16|. 

Finally,  the  last  way  in  which  lemmas  are  generated  in  CM  corresponds  rather  to 
implementing  certain  kind  of  tricks,  the  application  of  which  may  be  suggested  just 
on  a  syntactical  analysis  of  a  situation  met  during  a  proof,  or  which  express  usual 
attempts  to  simplify  inductive  proofs,  such  as  finding  a  non-inductive  proof.  These 
heuristics  may  in  no  way  be  characterized  as  those  leading  to  fancy  and  interesting 
proofs  typical  for  clever  people,  simply  because  CW heuristics  are  not  so  particular  as 
to  rely  on  the  semantic  of  particular  background  knowledge.  This  is  why  we  say  that 
CM  heuristics  are  logically  based,  and  this  is  why  CM  cannot  be  depicted  as 
intelligent,  since  it  will  be  very  seldom  that  its  output  is  the  same  as  the  output  of  a 
clever  human.  Since  these  heuristics  are  considered  more  or  less  as  speeding  up  the 
theorem  proving  procedure,  when  an  application  of  a  heuristic  from  this  set  fails,  the 
failure  is  not  considered  as  a  significant  one.  This  means,  that  CM  differs  between 
significant  and  non-significant  failures,  similarly  to  ordinary  work  of  a  mathematician 
pictured  in  introduction.  This  explains  somewhat  why  CM  is  able  to  discard  a  failure 
to  prove  a  lemma  generated  in  course  of  a  proof  of  a  given  theorem,  and  to  look  for  a 
recovery,  for  instance,  attempting  to  use  another  tool  or  to  reformulate  the  problem 
leading  to  this  lemma.  This  illustrates  also  that  capturing  syntactically  the 
development  of  an  inductive  proof  plays  a  non-trivial  role  in  CM,  and  the  concept  of 
environment  and  environment  analysis  is  related  to  this  part  of  CM,  for  brevity,  non 
described  further  here. 

This  explains  where  formal  creativity  of  CM  comes  from.  In  [  1 1 )  we  describe  also  the 
basic  results  we  have  achieved  in  a  user-independent  automatization  of  ITP. 

Let  us  note  that  our  research  on  CM  is  still  not  completed.  However,  section  4.5 
justifies  our  conviction  that  CM  makes  a  real  step  towards  user-independent 
automatization  of  ITP,  and  may  already  now  be  considered  as  a  basis  of  a 
methodology  for  ITP,  in  the  sense  that  further  research  may  only  improve  and  extend 
our  basic  results.  In  other  words,  what  we  have  done  may  still  be  considered  as 
imperfect  or  not  worked  out  enough,  but  it  cannot  be  considered  as  useless  or  unsound 
for  a  methodology  of  ITP.  Saying  it  in  a  different  way,  our  results  may  already  help 
students  to  provide  standard  proofs. 


484 


4.4  Implementation 

Our  system  <PV3u:<y\(AS.  presented  in  [231,  [21],  is  an  experimental  implementation  of 
our  CM  methodology.  Due  to  a  lack  of  manpower,  the  present  state  of 
implementation  reflects  only  part  of  our  approach  (see  more  in  [111). 

4.5  Performance 

The  most  important  experiment  of  CM  is  solving  the  prime  factorization  problem 
presented  in  [22].  As  we  explain  there,  no  other  ITP  or  PS  approach  is  able  to  solve 
this  problem  from  the  axioms  presented  there  without  a  clever  user,  while  CM,  or  an 
ordinary  student  following  CM,  succeeds  from  this  set  of  axioms  as  well  from  the  sets 
of  the  axioms  used  by  other  approaches.  Let  us  mention  that  CM  generates  about  one 
hundred  lemmas  -  some  of  them  it  fails  to  prove  -  however,  succeeding  to  recover 
from  these  failures.  In  order  to  show  that  we  did  not  succeed  simply  because  we  arc 
rather  acquainted  with  natural  numbers,  let  us  mention  the  success  of  CM  to  solve 
rather  a  non-trivial  planning  problem  as  shown  in  [10].  Moreover,  the  CM  solution  is 
different  from  the  solution  proposed  in  [27]  for  which  a  human  expert  is  claimed  to  be 
necessary.  Other  successful  experiments  and  references  to  them  are  given  in  [10]. 

5  Further  Work 

We  still  work  on  the  failure  and  environment  analysis  of  CM.  The  experimental  work 
we  have  performed  --  with  vtXCOM^  and  by  hand  -  validates  the  utility  of  our 
methodology  and  the  benefits  of  software  implementation.  Unfortunately,  we 
presently  do  not  have  the  resources  for  a  complete  implementation,  which,  in  our 
estimation,  will  require  about  20  man/year. 

Our  experience  in  building  an  FC  system  and  all  the  research  we  have  done  in  ITP 
leads  us  to  believe  that  there  is  a  possibility  to  enlarge  formal  creativity  of  CM  to  a 
certain  human  creativity  (see  [22]  and  [10]).  We  confess  that  in  order  to  progress  more 
quickly  a  help  from  mathematicians,  logicians  and  computer  scientists  would  be 
welcome  for  this  extension. 

Conclusion 

In  this  paper  we  introduce  a  non-usual,  and  non-trivial,  division  of  approaches  to  the 
automatization  of  ITP.  We  explain  that  our  approach  is  a  representative  of  formally 
creative  approaches,  while  other  existing  approaches  fall  into  human  creative 
approaches.  We  explain  that  formal  creativity  of  our  approach  is  a  result  of  providing 
non-standard  solutions  to  the  problems,  that  in  our  opinion,  were  primordial  to  be 
solved  for  a  user-independent  automatization  of  ITP. 

In  contrast  to  CM,  classiHed  more  as  non-intelligent  FTP  aiming  at  standard  outputs 
only,  other  existing  approaches  often  behave  as  a  clever  student,  or  the  proofs 
obtained  in  these  approaches  are  in  some  sense  interesting  [4].  However,  these 
approaches  may  fail  when  our  methodology  succeeds.  The  reasons  of  their  failures 
become  clear  as  soon  as  one  realizes  that  these  approaches  either  do  not  tackle 
problems  of  ITP  as  problems  of  implementing  a  formal  technique,  but  rather  as 
problems  of  implementing  clever  heuristics  for  proving  a  particular  class  of  theorems, 
or  they  represent  a  formalism  suitable  for  expressing  user’s  own  heuristics,  but  do  not 
propose  mechanisms  making  this  formalism  alive,  or  they  are  able  to  run  out 
completely  a  proof  when  the  user  has  been  clever  enough  to  provide  a  suitable 


485 


knowledge  in  advance,  for  instance,  by  lemmas  expressed  in  a  form  suitable  for  the 
techniques  of  the  approach.  We  illustrate  this  in  [22),  where  we  compare  "the  ability" 
of  three  approaches  to  prove  the  prime  factorization  theorem. 

The  paper  shows  also  that  our  approach  is  complementary  to  the  other  ones.  In 
other  words,  both  formally  and  human  creative  approaches  have  their  place  in  the 
automatization  of  ITP.  The  former  approaches  tend  to  provide  a  methodology  for 
standard  ITP.  The  latter  ones  tend  to  fulfill  the  goal  of  Artiricial  Intelligence  by 
automating  intelligent  behavior  whenever  possible. 

References 

3.  R.  S,  Boyer,  J  S.  Moore;  A  Computational  Logic;  Academic  Press,  1979. 

4.  R.  S,  Boyer,  Y.  Yu:  Automated  Correctness  Proofs  of  Machine  Code  Programs  for  a 
Commercial  Microprocessor;  in:  CmDE  11  Proc,  Springer,  1992,  416-430. 

6.  R.  L.  Constable,  T.B.  Knoblock,  J.L.  Bates:  Writing  Programs  that  construct  Proofs; 
Journal  of  Automated  Reasoning,  vol.  1,  no.  3,  1985,  285-326. 

9.  M.  Franova,  A.  Galton:  Failure  Analysis  in  Constructive  Matching  Methodology:  A 
Step  Towards  Autonomous  Program  Synthesizing  Systems;  in:  R.  TrappI,  (ed): 
Cybernetics  and  System  Research  ‘92;  1553-1560. 

10.  M.  Franova,  Y.  Kodratoff,  M.  Gross:  Constructive  matching  methodology:  an 
extension  and  an  application  to  a  planning  problem;  forthcoming,  1993. 

11.  an  extended  version  of  this  paper;  forthcoming,  1993. 

12.  M.  Franova,  Y.  Kodratoff;  Practical  Problems  in  the  Automatization  of  Inductive 
Theorem  Proving;  Rapport  de  Recherche  No.752,  L.R.I.,  1992. 

13.  M.  Franova,  Y.  Kodratoff:  How  to  Clear  a  Block  with  Con»..  jve  Matching 
methodology;  in:  Proc.  IJCAr91;  Morgan  Kaufmann,  1991,  232-337. 

14.  M.  Franova,  Y.  Kodratoff:  Predicate  Synthesis  from  Formal  Specifications;  in;  B. 
Neumann,  (ed.):  ECAI  92  proceedings.  John  Wiley  &  Sons  Ltd.,  1992,  87-91. 

15.  M.  Franova,  Y.  Kodratoff:  Predicate  Synthesis  from  Formal  Specifications;  Using 
Mathematical  Induction  for  Finding  the  Preconditions  of  Theorems;  to  appear  in 
proceedings  of  NIL'92,  available  as  Rapport  de  Recherche  No.781,  L.R.I.,  1992. 

16.  M.  Franova,  Y.  Kodratoff;  Simplifying  Implications  in  Inductive  Theorem  Proving: 
Why  and  How?;  Rappiort  de  Recherche  No.788,  L.R.I.,  1992. 

17.  M.  Franova,  L.  Popelinsky:  Constructing  formal  specifications  of  predicates; 
forthcoming,  1993. 

18.  M.  Franova;  Program  Synthesis  and  Constructive  proofs  Obtained  by  Beth’s 
tableaux;  in;  R.  TrappI,  (ed):  Cybernetics  and  System  Research  2;  1984,  715-720. 

19.  M.  Franova:  CiW-strategy  :  A  Methodology  for  Inductive  Theorem  Proving  or 
Constructive  Well-Generalized  Proofs;  in  Proc.  IJCAr85,  1214-1220. 

20.  M.  Franova;  Fundamentals  of  a  new  methodology  for  Program  Synthesis  from  Formal 
Specifications:  CfW-construction  of  atomic  formulae;  Thesis.  1988. 

21 .  M.  Franova:  Precomas  0.3  User  Guide;  Rap.  de  Rech.  No.524,  L.R.I.,  1989. 

22.  M.  Franova:  A  constructive  proof  for  prime  factorization  theorem;  A  result  of  putting 
it  together  in  constructive  matching  methodology;  R.R.  No.780,  L.R.I.,  1992. 

23.  Franova;  An  Implementation  of  CM;  Proc.  of  ISSAC’90,  16-23. 

24.  M.  Franova;  Generating  induction  hypotheses  by  Constructive  Matching 
methodology  for  FTP  and  PS  revisited;  Rapport  de  Recherche  No. 647,  L.R.I..  1991. 

25.  H.  Kirchner:  Preuves  par  Completion;  Thesis,  21  June,  1985. 

26.  Z.  Manna,  R.Waldinger:  A  D^uctive  Approach  to  Program  Synthesis;  ACM  Trans, 
on  Prog.  Languages  and  Systems,  Vol.  2..  No.l,  January,  1980,  90-121. 

27.  Z.  Manna,  R.  Waldinger:  How  to  Clear  a  Block:  A  Theory  of  Plans;  Journal  of 
Automated  Reasoning  3,  1987,  343-377. 

29.  G.  Polya:  How  to  solve  it;  1957. 

30.  T.  Skolem:  The  foundations  of  elementary  mathematic  established  by  means  of  the 
recursive  mode  of  thought,  without  the  use  of  apparent  variables  ranging  over  infinite 
domains;  in:  J.  van  Heijenoort:  From  Frege  to  Godel;  1967,  302-334. 


Representing  the  Knowledge  used  during  the  Requirement 
Engineering  Activity  with  Generic  Structures 

Georges  Grosz,  Colette  Rolland 

Equipe  C.  ROLLAND.  C.R.I.,  Univcrsiic  Parisl-Sorbonnc,  17  Place  de  la  Sorbonne, 
75005  Paris,  FRANCE  email :  {grosz,  rolland)(a)masi. ibp.fr 

Abstract  :  Abslraciing  informal  rcquiremcnls  in  terms  of  formal 
specifications  of  the  future  information  system  is  the  aim  of  the  Requirement 
Engineering  activity  (RE).  It  is  a  cognitive  activity  mostly  based  on 
analogical  reasoning.  Skilled  designers  reuse  the  experience  gain  from 
previous  design  when  they  recognise  similarities  between  different 
applications.  In  order  to  build  truly  intelligent  systems  to  efficiently  support 
RE,  we  believe  mandatory  to  represent  the  knowledge  underlying  this 
experience.  Using  this  representation  allows  to  re-use  part  of  the  design 
process  and  thus,  brings  quality  as  well  as  productivity  improvements.  Our 
solution  relies  on  the  concept  of  generic  structure.  It  is  based  on  the  following 
observation  :  many  real-world  phenomena,  apparently  different,  are  described 
in  the  same  way.  A  generic  structure  is  the  expression  of  this  common 
denominator,  it  describes  beh.avioural  as  well  as  statical  aspects.  In  this  paper, 
we  present  the  concept  of  generic  structure  and  how  it  can  be  used  with  a  simple 
example. 

1  .  Introduction 

The  lerm  Requirements  En^ineerin^(RE)  lU  is  used  for  ihc  pari  of  the  systems 
development  effort  that  involves  investigating  the  problems  and  requirements  of  the 
users  community,  the  goat  being  developing  a  formal  specification  of  the  desired 
Information  System  (IS).  This  specification  must  be  validated  towards  the  end-users' 
needs.  The  succeeding  dcvcloptneni  phase,  where  this  specification  is  used  to  design  and 
aplemeni  a  working  system  which  is  verified  against  the  specification,  may  then  be 
called  Design  Engineering.  Figure  I  shows  the  relationship  between  Requirements 
Engineering  and  Design  Engineering.  Thus,  RE  aims  at  transforming  informal  IS 
requirements  into  formal  -  or  conceptual  -  specifications.  In  other  words,  RE  is  the 
activity  during  which  analysts,  designers  and  hopefully  end-users  try,  all  together,  to 
understand,  delimit  and  describe  in  formal  terms  the  most  suitable  software  to  help 
efficiently  end-u.sers  and  decision  makers  in  their  day-to-day  missions. 


Figure  I :  Information  .vy,v/-  ,vi  development  process 


487 


The  RE  acliviiy  has  been  recognised  lo  be  a  crucial  step  in  soflwiuc  development  as  well 
as  in  data  base  development  |2,  31.  All  these  authors  agreed  upon  the  fact  that  when  an 
error  occurs  during  RE,  its  correction  is  extremely  expensive  and  even  sometimes 
jeopardise  the  entire  software. 

RE  is  a  knowledge  intensive  task.  To  improve  CASE  capabilities,  we  believe  it  is 
mandatory  to  represent,  as  much  as  possible,  the  knowledge  used  by  designers  during  the 
RE  activity.  Using  this  knowledge,  designers  will  be  able  to  concentrate  on  the 
perception  of  the  real-world.  The  paper  deals  with  this  issue  ;  it  tries  to  understand  what 
kind  of  knowledge  can  be  represented  and  proposes  some  solutions  based  on  the  concept 
of  generic  structure. 

Based  on  such  hypothesis,  building  a  conceptual  specification  is  about  using  a  library  of 
generic  structures.  In  this  new  and  innovative  perspective,  the  designer  does  not  have  to 
redo  conceptualisation  eflbris,  he  can  concenuate  on  the  perception  and  the  understanding 
of  the  end  users  requirements. 

The  paper  is  structured  as  follows.  The  next  section  presents  the  concept  of  generic 
structure  by  showing  similarities  between  different  specifications  and  how  a  generic 
structure  is  derived.  Section  3  describes  examples  of  generic  structures.  In  Section  4,  we 
present  the  construction  of  a  semi  formal  specification  using  generic  structures  This 
example  describes  the  stock  management  in  a  company.  Section  5  is  dedicated  to  related 
works.  We  conclude  with  Section  6. 

2 .  Generic  .structure  :  the  very  idea 

Abstracting  a  formal  specification  from  a  set  of  requirements  given  by  end  users  is 
mostly  an  analogical  reasoning  (4|  .  It  is  not  a  fully  creative  task.  For  the  sake  of 
efficiency  and  productivity,  an  experienced  designer  reuses  previous  experiences.  When  he 
recognises  similar  phenomena  between  the  current  requirements  he  is  working  on  and 
previous  work  he  did,  he  remembers  the  solution  he  chose  and  atlapi  it  to  the  current 
problem.  The  knowledge  used  during  this  process  is  what  generic  structures  aim  at 
representing.  The  RE  activity  can  be  supported  by  a  library  of  predefined  generic 
structures  adequate  to  some  dom  a  chiuacteristics. 

The  concept  of  generic  structuic  results  of  the  two  following  observations.  First,  it 
exists  common  denominators  between  conceptual  specifications  describing  similar 
phenomena.  Second,  it  is  possible  to  e.stabiish  a  hierarchy  between  classes  of  domain 
phenomena.  This  two  observations  are  detail  in  turn. 

While  studying  different  conceptual  specifications,  one  can  realises  that  it  exists 
similarities.  Clearly,  labels  are  different  but  common  patterns  can  lx:  highlighted.  For 
instance,  there  is  something  in  common  between  the  conceptual  descriptions  of  a  car 
rental  company,  a  library  and  a  room  rc.servation  system.  In  a  car  rental  company,  cars 
arc  rented  by  customers,  they  bring  the  car  back,  when  no  car  arc  available,  the  request  is 
put  in  a  waiting  list,  etc..  In  a  library,  copies  of  books  lue  checked  ou;  by  subscribers, 
they  return  what  they  borrowed,  when  a  book  is  not  available,  the  loan  request  is  pul  in 
a  waiting  list,  etc..  In  an  hotel,  rooms  are  reserved,  when  patrons  leave  the  room,  its 
room  becomes  available  again,  when  no  'c  'ms  are  available,  the  request  is  put  in  a 
waiting  list,  etc..  In  those  different  application;.,  tne  car,  the  copy  of  a  book  and  the 
room  belong  to  the  same  domain  class  :  the  "resource"  class  ;  customers  consume 
resources,  they  bring  them  back,  when  they  cannot  have  what  they  want,  the  request  is 
put  in  a  w'aiting  list.  A  close  look  to  the  conceptual  specifications  shows  that  they  have 
a  common  underlying  structure.  As  shown  in  figure  2,  this  is  what  is  called  generic 


488 


Structure.  A  generic  structure  describes  statical  as  well  as  dynamical  properties  associated 
to  a  class  of  domain  phenomena. 


The  process  of  identifying  generic  stfuctures  is  a  difficult  abstraction  process.  It  requires 
to  detect  similar  patterns  in  different  specifications,  identify  the  underlying  domain  class, 
rename  in  generic  terms  the  elements  of  the  generic  structure. 

3 .  Generic  structures  :  some  examples 

RE  involves  two  kinds  of  knowledge:  domain  knowledge  and  model  knowledge. 

Domain  knowledge  deals  with  application  domains.  It  comprises  information  about  the 
application  domain  objects  and  the  rules,  cither  .static  or  behavioural,  which  describe 
different  possible  states  of  objects.  End-users  are  the  real  experts  of  the  application 
domain.  However,  they  arc  usually  not  able  to  build  conceptual  specifications  since  they 
are  not  familiar  with  model  knowledge. 

Model  knowledge  deals  with  conceptual  formalisms.  It  is  the  knowledge  about  a  set  of 
concepts  and  the  rules  defining  how  these  concepts  can  be  used  to  construct  a  conceptual 
model.  It  is  the  designer's  work  to  represent  the  application  domain  using  a  conceptual 
formalism.  Model  knowledge  is  also  about  being  able  to  transform  a  specification 
expressed  with  one  model  into  an  equivalent  one  expressed  with  another  model  (e.g., 
translating  an  E/R  diagram  into  an  Object-Oriented  schema). 

Generic  structures  combine  both  kinds  of  knowledge.  They  are  the  expression  of  classes 
of  domain  phenomena  using  a  model  or  the  expression  of  possible  transformation  from  a 
model  to  another  one.  The  former  type  is  called  "Domain  dependent"  and  the  latter 
"Model  dependent".  In  this  paper,  we  focus  on  domain  dependent  generic  structures. 
Model  dependent  generic  structures  are  tlcscribcd  in  [5]. 

3.1.  Domain  dependent  generic  structures 

In  [6],  many  different  types  of  domain  dependent  generic  structures  arc  described.  We 
define  simple  basic  generic  structures  (they  deal  with  state  changes,  times  changes)  and 
also  compound  generic  structures  (they  deal  with  resources  and  with  the  management  of 
waiting  order  concerning  re.sourccs).  In  this  paper  we  only  provide  a  single  example  of 
domain  dependent  generic  structure  :  the  generic  structure  describing  the  general 
behaviour  of  a  re.sourcc.  In  the  following,  we  first  precise  the  concept  of  resource  and 
informally  describe  its  generic  behaviour.  We  end  this  section  with  the  presentation  of 
the  corresponding  generic  structure. 


489 


A  resource  is  often  dealt  wit!,  in  information  system  to  denote  available  supply  that  can 
be  drawn  upon  when  needed.  In  other  words,  a  resource  is  an  entity  which  is  required  to 
be  available  for  the  execution  of  a  proce.ss,  and  becomes  unavailable  when  the  process  is 
using  it.  In  a  library,  for  instance,  the  entity  “copy  of  a  book"  is  a  resource  for  the 
"loan"  process  while  the  entity  book  itself  is  not  a  resource  because  it  is  not  involved  in 
the  loan  process.  In  a  personal  computer  network  with  a  shared  printer,  the  entity 
"printer"  is  a  resource  while  the  process  is  the  "printing  of  a  document". 

When  a  resource  is  available,  it  can  be  consumed  when  requested  by  a  consumer.  As 
.soon  as  it  is  consumed,  it  becomes  unavailable  for  the  other  possible  consumers  because 
a  re-source  is  at  the  exclusive  use  of  the  consumer  who  obuiined  it.  When  a  resource 
becomes  unavailable  (curative  management)  or  will  become  unavailable  (preventive  ma¬ 
nagement),  it  may  be  useful,  under  some  conditions,  to  ask  for  more  from  the  supplier. 
For  instance,  when  the  inventory  of  a  product  becomes  low  or  zero,  an  order  may  be 
automatically  sent  to  the  supplier.  Re-ordering  may  have  no  meaning  if  the  availability 
of  a  resource  is  set  once  for  all  at  the  creation  time  (e.g.,  a  printer  in  a  local  network).  A 
resource  must  be  created  (its  entry  in  the  system  corresponds  to  a  creation).  Its  supplier 
can  be  internal  to  the  application  (e.g.,  a  company  building  products)  or  external  (e.g., 
the  suppliers  of  a  super  market).  Complementary,  a  resource  should  be  deleted.  Figure  3 
shows  the  generic  sducture  corresponding  to  the  behaviour  of  a  resource. 

To  describe  this  generic  structure,  we  use  a  model  defined  in  (7].  This  model  is  based  on 
the  notion  of  actor,  event  and  entity  ;  they  are  represented  by  double  rectangle,  pentagon 
and  circle  respectively.  The  dashed  rcchingle  stand  cither  for  an  actor  or  an  entity.  In  this 
model,  the  notion  of  event  includes  both  the  modifying  action  and  the  triggering 
conditions.  Evenus  arc  triggered  cither  by  actor(s)  or  cniity(ics)  ;  events  modify  either 
actor(s)  or  cntity(ics),  they  tire  the  target  of  the  penuigon.  It  is  also  possible  to  express 
static  relationships  between  entities  and/or  actors  by  drawing  a  line  between  them. 


Figure  3  :  generic  structure  for  the  management  of  a  resource 


The  entity  "rc.sourcc"  is  modified  by  three  events :  "consume",  "create"  and  "delete",  they 
are  respectively  uiggered  by  "consumer",  "producer”  and  "operator"  The  entity  "resource" 
triggers  the  event  "ask"  which  modifies  the  "producer".  On  lop  of  the  behavioural 
properties  related  to  a  resource,  a  siruclunil  pattern  is  identified.  A  resource  must  have  the 
following  properties  :  reference,  wording,  creation  date  and  a  state  having  a  value  in 
("available",  "unavailable"). 

The  study  of  the  different  types  of  resource  management  leads  us  to  define  different 
categories  of  resources.  Some  arc  consumable  (for  instance,  the  product  bought  by 
customers  in  a  stock  management  system);  among  those,  some  are  perishable  (fresh 
products  in  a  grocery)  and  others  arc  non-perishable  (the  hardware  or  ciccu-onies).  There 
are  also  re-usable  resources  (for  instance  the  cars  in  a  rental  company).  In  this  case,  one 


490 


can  distinguish  between  renewable  (books  in  a  library  or  cash  in  a  bank)  and  repairable 
resources  (cars  from  a  renuil  company  or  planes  from  an  airline),  A  resource  can  be 
characterised  by  more  than  one  of  the  mentioned  characteristics.  Figure  4  shows  how  the 
different  characteristics  arc  organised  in  an  “is_a“  hierarchy.  The  symbol,  borrowed 
from  the  NIAM  notation,  expresses  that  the  two  classes  are  disjunctive. 


Figure  4:  the  “is  a"  hierarchy  among  resources 


For  each  classes,  we  defined  a  generic  structure  describing  the  specific  behavioural  and 
structural  properties,  they  arc  detailed  in  161.  The  way  we  build  them  is  the  same  as  the 
one  we  used  to  build  the  general  one  (figure  3),  (i.c.,  to  manually  define  the  particular 
behaviour  and  properties  and  express  it  using  generic  structures).  Each  class  of  the 
hierarchy  inherits  the  behaviour  and  properties  of  its  superclasses. 

For  instance,  the  class  of  "perishable  resource"  is  associated  to  the  generic  structure 
depicted  in  figure  5.  It  shows  that  (1)  when  a  pcri.shablc  resource  arrives,  an  operator 
triggers  an  event  "uike  delivery",  this  event  create  the  resource  and  initialise  the  agenda  to 
the  sell  by  date.  (2)  the  agenda  triggers  the  event  "past  sell-by  date"  and  change  the  state 
of  the  resource  to  "out  dated".  Thus,  the  statical  structure  of  a  resource  comprises  a  "scll- 
by  date"  and  the  state  property  can  have  the  value  "out  dated". 


Figure  5  :  generic  structure  for  a  perishable  resource 


4  .  Using  generic  structures 

In  this  section,  we  present  the  general  methodology  for  using  generic  structures  and 
develop  an  example  showing  the  consu-uction  of  a  specification  describing  partially  the 
stock  management  of  products  in  a  company. 

4.1  Methodology 

The  generic  structures  presented  in  the  previous  sections  can  be  supported  by  a  library  of 
generic  structures.  This  will  change  the  nature  of  Requirement  Engineering  activity 


491 


ilself:  ihc  designer  does  noi  luive  lo  express  ihe  real-world  phenomena  using  a  formal 
model  but  to  identify  those  phenomena  in  the  current  application  domain  and  instantiate 
the  generic  structures  associate  to  those  phenomena  to  build  specifications. 

The  use  of  generic  structures  consists  of  the  following  steps: 


A)  Recognisin/i  a  phenomenon  of  the  application  domain  as  an  instance  of 
one  of  the  itemised  classes  of  phenomena  and  in  the  mean  time 
selecting  the  corresponding  generic  structure.  Until  now.  the 
recognition  of  the  phenomena  described  by  domain  dependent  generic 
structures  remain  at  the  designer’s  charge. 

B)  Defining  the  correspondences  between  the  elements  described  in  the 
generic  structure  and  those  which  already  belong  to  the  current 
specifications.  While  doing  so,  instantiation  rules  are  highlighted,  they 
will  have  to  be  used  in  a  consistent  way,  e.g.,  use  the  same  renaming 
rule  for  the  same  terms  in  the  different  generic  structures,  avoid  naming 
conflicts  (using  the  same  name  for  two  different  purposes)  and  so  on. 

C)  Understanding  the  elements  which  have  not  been  .set  in  correspondence 
in  the  previous  step  and  find,  if  they  exi.st.  their  equivalent  in  the 
application  domain.  This  step  leads  to  the  improvement  of  the 
understanding  of  the  application  domain  and  may  lead  to  the  discovery 
of  new  issues  which  can  lead  to  go  back  to  step  (A)  to  instantiate 
additional  ucneric  structures. 


The  process  of  using  domain  dependent  generic  structures  ends  when  the  designer  can  no 
longer  identify  itemised  real  world  phenomena.  The  designer  has  then  to  complete  the 
current  specification  with  the  specific  elements  of  the  application  domain  not  yet 
dc,scribcd  with  the  generic  structures. 

The  RE  activity  is  a  decision  process.  The  designer  makes  decisions  all  the  time.  In  the 
first  step,  he  has  to  recognise  application  domain  phenomena  as  instances  of  generic 
structures,  and  this  is  a  decision.  In  the  second  step,  the  decisions  concern  the  renaming 
and  then  in  the  third  .step,  decisions  dealing  with  refinement.  In  [8],  we  demonstrate  how 
such  decisions  can  be  handled  in  a  knowledge  based  formalism  (i.e.  the  triplet  <situaiion, 
decision,  aclion>)  and  succe.ssfully  used  in  a  CASE  tool. 

4.2  The  stock  management  example 

We  consider  a  company  selling  products  to  its  customers.  Customers  request  arc  made 
trough  orders.  This  first  description  of  the  application  domain  is  shown  in  Figure  5. 


Figure  5  .  A  fir.st  descriptitm  of  the  stock  management  company 


In  the  following,  we  apply  the  methodology  presented  in  the  previous  paragraph  to  the 
instantiation  of  the  generic  structure  depicted  in  figure  3. 


492 


Slcp  A  :  The  eniliy  "product"  is  recognised  as  belonging  lo  ihe  class  "resource".  Tlie 

generic  siructurc  described  in  Seciion  has  lo  be  insmniiaicd. 

Step  B  :  The  correspondences  are  the  following  : 

"resource"  is  ihe  entity  "produci"-. 

"consumer"  is  the  actor  “customer"' 

“consume"  is  the  event  “buy". 

Step  C  :  The  following  elements  tire  renamed  and  added  to  the  spccil’icalion  ; 

"producer"  is  identified  as  the  actor  "supplier"-, 

"produce"  anti  “ask"  arc  identified  as  the  events  "deliver"  and  "re-order", 
respectively; 

"operator"  is  identified  as  the  actor  "-warehouseman"-, 

"delete"  is  identified  as  the  event  "withdraw" . 

The  result  of  the  instantiation  of  the  generic  structure  is  shown  in  Figure  6.  For  the  sake 
of  chnity,  note  that  we  don't  show  the  instantiation  of  the  statical  aspect  described  by  tlie 
generic  structure  (figure  3) 


Figure  6:  the  current  .specijicalion  after  instantiating  Ihe  generic  structure  about  resource 

Assume  the  products  sold  by  the  company  arc  perishable.  This  assertion  leads  to  the 

following. 

Step  A  ;  The  entity  "product"  is  recognised  as  belonging  to  the  class  "perishable 
rc.soiirce".  Tlie  generic  structure  described  figure  5  has  to  be  insuintiatcd. 

Step  B  :  The  correspondences  arc  the  following  ; 

"re.wurce"  is  the  entity  "product"', 

"operator"  is  the  actor  "warehouseman"; 

Slcp  C  :  The  following  elements  arc  rcnamcti  and  added  to  the  specification  : 

the  event  "take  delivery  of  is  identified  as  the  events  "reception"; 
the  agenda  keeps  its  name  as  well  as  the  event  "past  sell-by  date" 

The  result  of  the  instantiation  of  the  generic  structure  is  shown  in  Figure  7. 


Figure  7:  after  instantiating  the  generic  structure  about  perishable  resource 


493 


Assume  llic  designer  does  idcnlil'y  no  other  real  world  phenomena  as  belonging  to  the 
itemised  classes.  Then,  the  current  specification  must  be  completed  according  to  the  end- 
users  needs  (e.g.,  to  describe  the  delivery  system,...).  To  keep  the  example  simple,  we 
skip  this  step. 

However,  when  the  sjtccilication  is  completed  towards  the  conceptual  description  of  the 
users  needs,  the  designer  should  invest  some  effort  in  the  discovery  of  new  generic 
structures.  This  is  a  difficult  task.  He  has  to  concentrate  on  the  elements  of  the  speci¬ 
fications  which  has  not  been  generated  with  the  generic  structures.  A  close  look  on  those 
elements  should  able  him  to  identify  new  classes  of  real  world  phenomena,  express  the 
corresponding  generic  structure  and  finally  add  them  at  the  right  place  in  the  hierarchy. 

5  .  Related  Work 

Some  attempts  to  effectively  guide  the  data  base  designer  towards  the  design  of  a 
conceptual  model  can  be  found  in  the  research  literature.  In  the  SPADE  project  19],  this 
is  achieved  by  means  of  a  "navigation  knowledge  dictionary".  This  dictionary  helps  the 
designer  to  choose  between  design  tasks  but  docs  not  help  to  build  specifications.  Within 
the  ITHACA  project,  the  RECAST  tool  1 10]  u.scs  a  dictionary  and  guides  the  designer  to 
reuse  object  specifications.  However,  the  designer  has  to  motlify  old  specifications  and 
adapt  them  to  the  current  application.  Another  approach,  experienced  in  the  two  expert 
systems  OICSI  111]  and  SECSI  ( I2|,  generates  part  of  the  conceptual  schema  from  a 
natural  language  dc.scription  of  the  application  domain.  In  both  cases,  the  result  depends 
on  the  completeness  and  accuracy  of  the  natural  languag-'  description.  In  113),  a  letirning- 
froin-examples  strategy  has  been  developed  to  induce  form  properties  and  functional 
dependencies  from  day  to  day  documents.  Exleasions  of  this  work,  done  by  Talldal  [14], 
allow  to  generate  an  extended  E/R  schema.  This  approach  docs  not  help  to  specify  any 
kind  of  behaviour.  In  |4],  Maiden  and  Sutcliffe  propose  analogy  as  a  means  of  reusing 
specifications  from  a  CASE  repository  during  the  reouirements  analysis.  An  Intelligent 
Reu.se  Advisor,  based  on  cognitive  models  of  .specifications  reuse  and  automatic 
analogous  reasoning,  helps  the  designer  to  retrieve  and  customise  existing  components. 
Our  work  goes  on  the  same  line  as  Reubenstien's  and  Waters'  works  (with  the 
Requirements  Apprentice  (I5|)  where  they  propose  cliches  as  a  mean  for  building 
specifications  with  a  global  approach. 

In  (16),  the  authors  try  to  make  software  development  knowledge  cxpE'-'i.  They  propose 
"to  base  the  software  development  process  on  the  reuse  of  descriptive  models  of  domain- 
specific  knowledge  and  prescriptive  models  of  software  engineering  knowledge".  The 
domain  models  they  proptisc  arc  based  on  the  same  ideas  than  generic  structures. 

Similar  approaches  to  generic  structures  have  been  developed  in  software  engineering 
community.  In  (171,  the  authors  propose  to  use  a  software  base  for  enhancing  the 
(lexibility  of  software  systems  by  reusing  generics  programs.  They  identify  classes  of 
applications  and  programs  lattices.  Tlic  instantiation  of  these  lattices  allow  to  generate 
programs.  In  the  PARIS  system  \\^'u  partially  interpreted  schemas  arc  used  to  produce 
programs.  A  partially  interpreted  schema  is  a  program  that  has  some  portions  of 
remaining  abstract  or  undefined.  Generic  skeleton  of  programs  arc  re-used.  In  [19],  the 
concept  of  design  template  is  used  to  represent  classes  of  tasks  which  arc  characterised  by 
their  structure.  Generic  algorithm  specification  as  well  as  generic  data  interface 
information  arc  proposed  to  build  programs.  To  some  extent,  applications  lattices, 
partially  interpreted  schemas  as  well  as  design  templates  arc  to  the  production  of 
programs  what  generic  structures  arc  to  iltc  prtxiuction  of  conceptual  specifications. 


494 


However,  none  ol'  ihe  above  approaches  have  iriecl  lo  explicitly  represent  the  design 
knowledge  used  by  designers  as  generic  structures  do.  We  believe  that  generic  structures 
are  well  adapted  to  conceptual  dtita  base  design.  This  activity  is  not  guided  by  the  goal 
(the  specifications  to  be  bu'lt)  but  by  the  source  (the  end-users  requirements).  The 
approach  we  propose  does  not  force  the  designer  to  identifv  a  goal  but  to  identify  the 
source  then  to  instantiate  generic  structures  for  the  representation  of  this  phenomena. 
This  kind  of  re-use  is  different  from  the  re-use  of  .software  components  or  object  schema 
as  experimented,  for  insumcc,  in  the  ITHACA  project  |20).  In  fact,  these  two  kinds  of  re¬ 
use  are  complementary  and  should  be  combined  in  a  CASE  tool. 

6.  (Jonclii.sion 

In  this  paper,  we  prc.scntcd  a  new  concept  for  the  requirement  engineering  activity; 
generic  structure.  A  generic  structure  is  a  formal  and  generic  cxprcs‘'ion  of  a  class  of  real- 
world  phenomena,  apparently  different  but  designed  in  the  same  way.  A  generic  structure 
is  independent  of  a  particular  application.  A  generic  structure  .illows  to  re  use  the  process 
by  which  specifications  arc  build  and  not  the  specifications  themselves.  The  designer 
does  not  have  to  redo  the  conceptuali.sation  effort  for  the  listed  phenomena,  he  simply 
insutntiates  the  generic  structures. 

Two  types  of  generic  structures  are  mertioned  in  this  paper ;  they  correspond  to  two 
types  of  knowledge  used  during  the  RE  activity,  namely,  domain  and  model  knowledge. 
In  this  paper,  we  f(x;us  on  domain  dependent  knowledge  and  the  associated  generic 
structures.  The  fortnalism  u.sc  to  dcscritw  generic  .structures  is  based  on  actor,  event  and 
entity.  However,  generic  stfucturcs  tu’e  not  model  dependent,  they  can  be  used  to  build 
any  kind  of  specification.  In  |21 1,  they  are  expressed  with  the  Z  language. 

We  have  prototyped  the  idea  of  generic  structures  using  the  .shell  of  the  OICSI  expert 
system  [III.  The  experience  has  shown  it  to  be  a  promising  direction  for  research. 

We  want  to  suess  that  generic  structures  arc  classes  of  phenomena  lot) ;  and  hence,  it  is 
possible  and  interesting  to  develop  generic  .structures  for  generic  structures.  These  meta- 
generic  structures  would  characterise  the  concept  of  generic  structure  and  how  to  create 
and  use  them.  We  will  investigate  this  in  future  research. 

The  domain  dependent  generic  struc'urcs  we  put  forward  dtx;s  not  cover  every  real-world 
phenomena.  This  study  will  be  extended  to  other  types  of  phenomenon.  We  arc  currently 
studying  how  generic  structures  could  be  discovered  using  a  learning  by  example  tool, 
with  existing  specifications  as  input. 

Coupling  our  work  with  the  classical  approach  for  rc-using  software  components  is  a 
promising  rc.scarch  direction.  It  allows  one  to  cover  the  whole  data  base  development 
cycle.  The  problem  is  to  study  how  generi  •  structures  can  match  software  components  to 
build  specific  projects. 

References 

1  .  E.  Dubois,  J.  Hiigclslcin,  E.  Lahou.  F.  l*ons:wrl.  A.  Rifaiil:  "A  Knowledge  Represenialion 
Langihige  for  Rcquiremenis  Engineering",  Proceedings  of  the  IEEE.  Vol.  74.  Nb.  10,  pp. 
1431-1444,  I9«6. 

2.  B.W.  Roehm:  "Verifying  and  Validaling  ..Software  Recpiiremcnts  and  Design 
.Specifications",  IEEE  Software.  Vol.  I.  Nb  I.  January  19S4. 

3.  A.  M.  Davis  :  "Software  requireinenis.  analysis  and  specification",  Prentice  Hall  ed. 
1990. 

4.  N.  Maiden  .and  A.  Sutcliffe:  "Sijccificalion  Reuse  by  Analogy",  proceedings  of  2nd 
Workshop  on  the  Next  Generation  of  CASE  loo’s,  Trondheim,  Norway,  May  1991. 


495 


5.  G.  Gros/,  J,  Brunei  :  "Transformation  of  Informal  Emilies  imo  Object  Schemas  usiiip 
Generic  Structures",  suhmiueil  paper. 

6.  G.  Gros/:  "Formalisation  ties  connaissances  reulilisahics  pour  la  conception  des 
syslemes  d'informalion",  FhO  dissertation  University  Pans  6.  December  1991. 

7  G.  Gros/:  "Buildini;  Information  System  Re(|uirements  using  Generic  Structures",  to 
apjK-ar  in  the  Proceedings  of  die  16ih  COMPSAC  Conference,  Chicago,  September  1992. 

8.  Cl.  Gros/.,  C.  Rolland:  "Knowledge  Support  fi.r  Information  System  Design",  in  the 
proceedings  of  '  I  he  5lh  Jerusalem  Conference  on  Information  Technology",  pp  261- 
268,  October  1990,  Jerusalem,  Israel. 

9.  V.  Seppanen,  M.  Heikkincn,  R.  Limulanipi,  "SPADE  -  Towards  CASE  Tools  Th  i  Can 
Guide  Design",  proceedings  of  the  CAiSE  91  Conference,  n  222-239,  Trondheim, 
Norway,  May  1991 . 

10.  M.  Cl.  Fugini,  B.  Pcrnici:  “RECAST:  A  Tool  for  Reusing  Rcquiremenis,  in  Advanced 
Information  Systems  Engineering",  B.  Steiner,  A.  Solvberg,  L.  Bergman  (cds.). 
Springer  Verlag  Lecture  Notes  In  Computer  Science,  1990. 

1  1 .  Rolland,  C.,  Proi.s,  C.;  "An  Exjxiri  System  Approach  to  Information  System  Design", 
in  IFIP  World  Congress  86.  Dublin,  1986. 

12.  M.  Bou/eghoub,  G.  Gardarin,  E.  Metais  :  "Database  Design  Tool:  an  Expert  System 
Approach,"  Proc.  of  the  ll‘*’  VLDB  Conference,  Stockolm,  August  1985. 

13.  M.  V.  Mannino,  V.  P.  Tseng;  "Inferring  Data  base  Rcqiiircmcnls  from  Examples  in 
Forms',  7ih  International  Conference  on  Entity-Relationship  Approach,  pp  1-25, 
Rome.  Italy,  1988. 

14.  Talldal  B.,  Wangler  B.:  "Extracting  a  Conceptual  Model  from  Examples  of  Filled  in 
Forms",  Proc.  of  Ini.  conference.  COMAD.  pp  327-350,  N.  Prakash  (cd).  New  Delhi, 
India,  December  199(1. 

15.  H.B.  Rcubcnsiein,  R.C.  Waters:  "The  Requirements  Apprentice:  Automated  Assistance 
for  Requirements  Acquisition".  IEEE  Transactions  on  Software  Engineering,  Vol.  17, 
Nb.  3,  pp  226-240,  March  1991. 

16.  G.  Arango,  P.  Freeman  ;  "Modeling  knowledge  for  software  development",  in  Proc.  3rd 
Workshop  on  Software  Specification  and  Design.  IEEE  Computer  Society  Press  (ed), 
London,  August  1985,  pp63-66. 

17.  R.  T,  Milicrmcir,  M.  Oppit/:  "Software  Bases  for  the  Flexible  Composition  of 
Applications  Systems",  IEEE  Transaction  on  Software  Engineering,  Vol  13,  No  4, 
pp44()-46(),  April  1987, 

18.  S.  Katz.  C.A.  Richter,  K.S.  The  :  "PARIS  :  A  System  for  Reusing  Partially  Interpreted 
Schemas",  in  the  proceedings  of  the  9th  Software  Engineering  Conference,  pp  37'’- 
384,  1987. 

19.  M.  T.  Harandi,  F.  H.  Young  :  "Template  Based  Specification  and  Design”,  in  Proc.  3rd 
Workshop  on  Software  Specification  and  Design,  IEEE  Computer  Society  Press  (cd), 
London,  August  1985,  pp94-97. 

20.  B.  Pernici,  G.  Vaccari,  K.  Villa;  "INTRES;  INTclligcnt  REqiiiremenls  Specification", 
proceedings  IJCAr89  Workshop  on  Automatic  Software  Design,  Detroit,  Michigan. 
USA,  August  1989. 

21.  G.  Grosz,  C  Kung,  C.  Rolland  ;  "Generic  Structures  :  A  Reuse-Based  Approach  to 
Conceptual  Design  of  Information  Systems",  proceedings  WITS  conference,  Dallas, 
Texas.  December  1992, 


Development  of  a  Programming  Environment 
for  Intelligent  Robotics* 


Stefano  Caselli^,  Antonio  Natali^  and  Francesco  Zanichelli* 


’  Dipartimento  di  Ingegneria  dell’Informazione 
Universita  di  Parma  -  Italy 

*  Dipartimento  di  Elettronica,  Informatica  e  Sistemistica 
Universita  di  Bologna  -  Italy 
e-mail:  mczane@verdi.eng.unipr.it 


Abstract.  Given  the  wide  range  and  diversity  of  proposed  architectures 
for  autonomous  robotic  agents,  an  essential  role  can  be  played  by  a  pro¬ 
gramming  environment  not  hardwired  to  any  particular  architecture. 
This  paper  discusses  the  design  of  a  programming  environment  for  robotic 
systems,  promoting  rapid  prototyping  techniques  for  building  different 
robotic  architectures,  while  retaining  a  good  efficiency  in  robot  control. 
The  environment  has  been  conceived  as  the  integration  of  a  real-time 
robotic  machine  with  a  full-fledged  logic-based  system. 


1  Introduction 

Maybe  the  most  critical  and  challenging  issue  in  designing  robotic  systems  work¬ 
ing  in  dynamic,  unstructured  environments,  is  related  to  the  definition  of  their 
architecture.  The  term  architecture  is  used  here  to  designate  the  logical  organi¬ 
zation  of  a  system  constituted  by  a  network  of  (autonomous)  agents  that  can 
achieve  goals,  under  real  time  constraints,  in  not  completely  predictable  environ¬ 
ments,  by  using  non-perfect  sensors  and  actuators.  Each  architecture  proposes 
a  particular  methodology  encompassing  issues  such  as  function/competence  de¬ 
composition  and  interaction,  computational  homogeneity /heterogeneity,  decision 
responsibility  distribution  and  static/dynamic  optimization  of  the  bounded  com¬ 
putational  resources  available. 

Literature  reports  on  a  wide  spectrum  of  approaches  to  the  design  of  robot 
architectures.  Classical  approaches  promoted  hierarchical  architectures  [1,  2], 
with  the  provision  for  different  abstraction  and  response  time  levels.  The  main 
design  principle  is  functional  decomposition,  together  with  a  strict  vertical  con¬ 
trol/data  flow  among  the  components,  looping  through  perception,  modeling, 
planning,  execution  and  motor  control.  These  efforts  are  loosely  related  to  tra¬ 
ditional  AI  in  the  sense  that  the  highest  level  is  typically  a  planner,  which  deals 
with  a  symbolic  representation  of  robot  actions  and  of  the  world.  More  recently, 

*  This  work  has  been  partially  supported  by  CNR  “Progetto  Finalizzato  Robotica”, 
“Progetto  Finalizzato  Sistemi  Informatici  e  Calcolo  Parallelo” ,  and  Speci<il  Program 
N.  92.00065.CT12. 


497 


research  on  mobile  robots  has  led  to  the  development  of  so-called  reactive  and 
behavior-based  architectures  [3,  4],  which  are  inspired  by  the  principle  that  in¬ 
telligence  is  determined  by  the  dynamics  of  interaction  with  the  physical  world. 
Implicit  integration  into  higher-order  robot  capabilities  of  simple  task  achieving 
behaviors,  exploiting  direct  sensor  information  instead  of  intermediate  represen¬ 
tations,  can  be  regarded  as  the  major  design  tenet  of  this  line  of  research.  Since 
both  approaches  have  their  strengths  and  weaknesses,  the  most  recent  trend  is 
to  integrate  reaction  and  planning  by  proposing  hybrid  architectures  [5,  6]. 

Given  such  a  wide  range  and  diversity  of  architectures,  an  essential  role  can  be 
played,  at  least  for  those  who  are  devoid  of  any  constraint  dictated  by  philosoph¬ 
ical  committment,  by  a  programming  environment  not  hardwired  to  any  particu¬ 
lar  architecture.  The  main  goal  of  such  an  environment  should  be  to  enable  users 
to  exploit  rapid  prototyping  techniques  to  build  different  kinds  of  robotic  archi¬ 
tectures,  while  retaining  good  efficiency  in  the  robot  control.  However,  designing 
a  robotic  system  is  not  only  a  matter  of  defining  its  logical  architecture.  It  also  re¬ 
quires  dealing  with  hardware  interfacing,  drive  system,  real-time  servos,  smooth 
trajectory  interpolation  and  similar  concrete,  time-consuming  issues  which  con¬ 
cur  in  determining  the  basic  capabilities  and  dexterity  upon  which  “intelligent” 
behavior  is  built.  Under  a  pragmatical  point  of  view,  the  environment  whose 
development  is  presented  in  this  paper  is  also  intended  to  eventually  become  a 
general  workbench  to  exercise  AI  techniques  to  a  controlled-complexity  robotic 
domain  as  well  as  to  support  the  design  and  simulation  of  robotic  workcells. 

The  paper  is  organized  as  follows.  In  section  2  the  requirements,  both  general 
and  specific,  of  a  programming  environment  for  robotics  are  discussed.  The  two 
main  components  of  the  environment  are  then  presented  in  sections  3  and  4. 
Section  5  describes  the  overall  programming  model  of  the  integrated  environment 
whose  development  is  currently  in  progress,  whereas  in  section  6  the  prototyping 
of  a  sample  application  is  illustrated.  A  brief  discussion  summarizes  the  paper. 


2  Requirements  and  Issues 

Since  the  goal  is  the  definition  of  an  integrated  environment  to  explore  issues 
in  high-level  as  well  as  in  low-level  robot  control,  besides  general  software  engi¬ 
neering  requirements: 

—  open,  extensible  organization 

—  full  integration  among  subcomponents 

—  scalability  in  the  computational  resources  handled 

the  environment  should  be  characterized  by  the  following  domain-specific  at¬ 
tributes: 

-  integration  of  different  computational  and  programming  paradigms 

-  simulation  capabilities  for  preliminary  evaluation 

-  metaresoning  facilities 


498 


The  multi-paradigm  requirement  seems  mandatory  when  one  aims  at  de¬ 
veloping  hybrid  architectures,  which  generally  partition  the  system  into  differ¬ 
ent  levels  exploiting  explicit  declarative-style  knowledge  along  with  subsymbolic 
task-achieving  reactions. 

Different  design  approaches  have  to  be  compared  mostly  in  an  experimen¬ 
tal  way,  since  general  formal  methods  of  verification  are  not  available.  For  a 
preliminary  assessment  of  performance,  simulation  is  considered  an  almost  nec¬ 
essary  tool.  Besides  the  visual  representation  of  robot  structure  and  kinematics, 
task  execution  in  virtual  workspaces  can  give  insights  in  the  robot  operational 
behavior  to  an  extent  which  can  be  varied  according  to  the  accuracy  of  the  sim¬ 
ulation.  Whatever  degree  of  accuracy  is  targeted,  sensory  information  has  to  be 
generated  starting  from  models  of  the  physical  objects  (including  the  sensors 
themselves)  and  the  interaction  among  them,  the  robot  and  other  agents  in  the 
robot  workspace. 

The  ultimate  performance  evaluation  of  the  design  comes  from  the  inter¬ 
action  of  the  physical  robot  with  its  actual  environment.  Thus,  an  integrated 
environment  should  support  a  straightforward  transition  from  simulation  to  real 
execution.  This  gap  may  be  reduced  when  integration  issues  are  enforced  from 
the  beginning  in  the  design  of  the  environment,  in  particular  employing  the  same 
control  interface  and  referring  to  the  same  set  of  commands  for  the  simulated 
and  the  real  robot.  Moreover,  since  adaptation  to  changes  in  the  world  is  one 
of  the  main  requirements  of  intelligent,  sensor-based  robots,  sensor  data  has  to 
be  made  available  to  user  programs,  hence  the  set  of  simulated  sensors  should 
correspond  as  far  as  possible  to  the  set  of  user  available  sensors. 

Finally,  the  environment  should  provide  an  explicit  representation  of  the 
agent  architecture  in  order  to  enhance  the  system  with  some  degree  of  self- 
awareness  and  metaresoning.  Such  a  feature  enables  a  uniform  approach  to  plan¬ 
ning,  monitoring,  debugging  and  system  reconfiguration. 

Current  robotic  programming  tools  are  generally  far  from  being  adequate 
to  fullfil  these  requirements  since  they  are  too  heavily  committed  to  low-level 
control  and  real-time  issues,  even  though  a  number  of  recent  proposals  attain 
considerable  results  in  restricted  areas  [7,  8,  9]. 

Programming  tools  should  instead  provide  a  unifying  framework,  allowing 
users  to  exploit  a  spectrum  of  software  technologies,  from  efficient  real-time 
programming  to  reconfigurable  knowledge-based  symbolic  programming.  The 
approach  we  are  following  in  this  research  is  to  conceive  a  robot  programming 
environment  as  the  integration  of  a  symbolic  system  constituted  by  a  full-fledged 
logic  environment  with  a  flexible  real-time  system  explicitly  designed  for  robotic 
control. 

As  our  primary  application  domain  is  dextrous  manipulation  and  exploration 
using  robot  arms  and  hands  [10,  11],  our  real-time  technology  is  based  on  a 
control  system  explictly  conceived  for  such  a  kind  of  robotic  applications,  namely 
RCCL,  the  Robot  Control  C  Library  [12].  RCCL  offers  a  number  of  advantages 
over  industrial  controllers  which  will  be  discussed  in  section  3  of  the  work. 


499 


As  regards  the  knowledge-based  technology,  our  choice  has  been  to  exploit 
a  programming  environment  based  on  an  extension  of  Prolog  with  features  for 
dynamic  modularity  and  concurrency.  Such  an  extension,  called  here  CPUX 
Communicating  Prolog  Units  with  contexts)  allows  users  to  exploit  Prolog  as  a 
system  programming  language. 

The  aim  of  this  work  is  to  describe  how  such  an  integration  is  achieved  and 
how  the  resulting  system  can  be  exploited  to  define  many  relevant  abstractions 
involved  in  robotic  architectures  and  extendible  programming  tools. 


3  The  Real-Time  Machine 

RCCL  provides  a  robotic  virtual  machine  for  low-level  control,  both  powerful  and 
extendible,  fully  exploiting  available  hardware  resources,  mechanics  and  senso- 
riality,  and  yet  abstracting  a  great  deal  of  implementation  details  not  relevant 
to  high-level  control  and  task  planning  or  programming. 

Basically,  RCCL  is  a  library  of  routines  for  describing  and  controlling  robot 
positions  and  actions,  combined  with  a  trajectory  generator  for  realizing  these 
actions  [12].  An  RCCL  application  program  is  written  in  “C”  and  uses  special 
primitive*  to  specify  robot  action  requests  and  queue  them  for  servicing  by  the 
trajectory  generator.  Thus,  RCCL  is  primarily  a  library  of  kinematics  and  control 
functions  along  with  a  run-time  support  for  task  execution. 

RCCL  has  been  chosen  as  the  execution  architecture  for  the  programming 
environment  (i.e.,  for  implementing  the  low-level  control  law)  since  it  provides: 

-  development  in  a  flexible  and  friendly  environment  (UNIX,  C,  workstation) 
rather  than  in  an  obsolete  and  rigid  industrial  robot  controller  using  an 
awkward,  unflexible  language; 

-  possibility  of  creating  special  sensor  interfaces,  complex  tasks,  synchroniza¬ 
tions  among  multiple  robots  or  other  agents; 

-  well-defined  standard  with  excellent  documentation  and  a  number  of  instal¬ 
lation  sites  worldwide. 

RCCL  is  supplied  with  a  simple  graphic  simulation  tool,  roboisim.  In  robot- 
sim  the  programmed  task  is  visualized  and  can  be  evaluated  with  respect  to  kine¬ 
matics  and  workspace  features.  The  peculiar  value  of  roboisim  lies  in  its  strict 
integration  with  the  RCCL  control  structure.  Thus  there  is  no  gap  between 
actual  or  simulated  execution  of  programmed  tasks,  as  the  very  same  control 
structure  is  exploited  for  both.  Due  to  its  flexibility,  RCCL  permits  program¬ 
ming  of  higher  complexity  tasks  than  afforded  by  typical  industrial  controllers. 
By  virtue  of  extensions  we  have  done  to  the  simulator,  even  these  highly  com¬ 
plex  tasks  may  be  directed  to  either  the  real  robot  or  its  simulation  in  an  almost 
transparent  manner. 

Our  extensions  to  roboisim,  although  only  based  on  geometric  modeling  of 
objects,  have  allowed  the  simulation  of  simple  grasp  procedures  as  well  as  the 
generation  of  contact  and  distance  sensor  feedbacks  which  are  routed  to  user 


500 


RCCL  programs.  The  resulting  robotsim-Il  also  includes  support  for  the  pres¬ 
ence  of  active  objects,  with  customizable  motion  law  and  for  interactively  mod¬ 
ifying  the  environment.  While  the  overall  simulation  model  appears  limited,  it 
seems  useful  for  experimenting  and  evaluating  robot  architectures  and  strate¬ 
gies  in  a  controlled-complexity  domain.  The  integrated  environment  supports 
straightforv.'ard  coufigmation  of  the  simulated  task,  including  robot  sensor  set 
and  workspace. 

4  The  CPUX  Programming  System 


A  subset  of 
ALPES 
tools 


Editor 

Browser 


Debugger 

Tracer 

Explanator 


Partial 
evaluator ' 


Context  and  knowledge  structuring 

Process  scheduling  in,  out,  read 

X-Window  interface 
J’-units,  coroutines,  reflection^ 
C-Prolog  interpreter 


UNDCtirt 


CPUX 


Fig.  1.  The  CPUX  system  and  its  relationship  with  ALPES 


The  environment  supoofir.  (and  is  itself  based  on)  an  extension  of  Prolog  with 
features  for  dynamic  :.:odularity  and  concurrency.  Such  an  extension,  called 
CPUX  (Communicating  Prolog  Units  with  conteXts)  [13],  allows  users  to  exploit 
Prolog  as  a  system  programming  language.  As  concerns  concurrency,  agents  can 
be  explicitly  created  and  destroyed,  each  agent  working  as  an  autonomous  CPUX 
machine.  Mechanisms  for  agent  interaction  are  provided  which  can  support 
mailbox-based  communication  models  as  well  as  generative  models  a  la  Linda 
[14].  Agents  sharing  the  same  processor  are  managed  by  a  scheduler  entirely  writ¬ 
ten  in  CPUX.  Process  interaction  policies  can  be  expressed  as  meta-programs, 
without  changing  object-programs  or  the  supporting  machine.  As  concerns  mod¬ 
ularity,  CPUX  fully  supports  contextual  logic  programming  [16],  a  programming 
paradigm  in  which  software  components  can  be  statically  and  dynamically  con¬ 
figured  by  composing  modules  [CPUX  units)  into  structures,  called  contexts. 
Through  contexts,  users  can  prototype  new  abstractions  and  exploit  these  proto¬ 
types  to  create  other  prototypes  in  an  incremental  way,  very  similarly  to  object- 
oriented  programming. 


501 


The  most  significant  architectural  features  of  CPUX  and  the  related  program¬ 
ming  methodologies  have  been  exploited  for  prototyping  an  integrated,  open 
and  evolving  programming  environment,  in  the  ALPES  Esprit  project  973  of 
the  European  Community  [17],  The  relationship  between  CPUX  and  ALPES  is 
illustrated  in  Fig.  1. 


5  The  Environment  Programming  Model 


The  simple  integration  of  ALPES  and  RCCL  around  a  client-server  interaction 
scheme  has  proven  extremely  valuable  even  limiting  to  a  “conventional”  robot 
programming  perspective,  i.e.  when  the  purpose  of  the  environment  is  to  assist 
users  in  developing  and  testing  robot  applications  in  the  framework  of  robot- 
or  task-level  programming  [18].  Efficiency  concerns  are  addressed  mostly  in  the 
RCCL  control  subsystem,  although  recent  work  has  reported  on  performance 
levels  of  compiled  Prolog  comparable  to  those  obtained  by  leading  C  compilers 
[19]. 

The  integration  between  the  two  subsystems  has  provided  immediate  results 
without  sacrificing  the  features  originally  offered  by  RCCL.  Some  of  these  results 
are  related  to  the  CPUX  system  language,  while  others  are  related  to  ALPES 
features  and  tools.  CPUX  features  are  particularly  suited  to  prototyping  and  also 
to  debugging  at  a  high  level  of  abstraction,  a  very  crucial  activity  in  develop¬ 
ing  robotic  applications.  ALPES  tools,  written  in  CPUX  as  modular,  extendible 
components,  have  been  specialized  to  build  high-level  interfaces.  In  particular, 
the  ALPES  X-Window  browser  has  been  extended  to  create  a  tool  which  al¬ 
lows  the  user  to  define  interactively  the  initial  setup  for  taisk  execution  under 
the  ro6o<sim-// simulator,  based  on  database  modules  containing  taxonomies  of 
known  robots,  available  sensors  and  machines/parts  to  be  chosen  and  included 
in  the  task. 

However,  the  main  result  of  the  integrated  environment  is  to  promote  a  pro¬ 
gramming  model  founded  on  the  concept  of  an  agent  society  in  which  agents  can 
interact  through  a  shared  abstraction  called  world  (fig.  2).  This  multiagent  model 
emphasizes  autonomy  and  heterogeneity  of  agents  in  a  Distributed  Artificial  In¬ 
telligence  perspective,  by  amalgamating  different  granularities  and  architectures. 
Agents  can  be  heterogeneous:  the  only  requirement  is  that  any  agent  has  to  agree 
on  the  abstract  communication  interface  constituted  by  the  world  (implemented 
as  a  CPUX  communication  unit).  To  this  purpose,  the  RCCL  real-time  machine 
has  been  extended  in  order  to  define  a  special  agent,  the  RCCL  agent,  which, 
acting  as  a  server,  looks  for  a  command  in  the  world  and  executes  it. 

In  conclusion,  this  system  makes  general-purpose  abstractions  and  mecha¬ 
nisms  regarding  important  aspects  of  robot  programming  immediately  available, 
such  as  agent  interaction,  agent  configuration  and  design,  and  meta-control  of 
agent  behavior. 


502 


Fig.  2.  Overall  framework  of  the  system 


5.1  Agent  Interaction 

Information  stored  in  the  world  can  represent  sensor-related  knowledge,  self- 
knowledge  and  agent-specific  knowledge. 

Sensor  knowledge  is  constituted  by  high-level  representation  of  sensor  data. 
In  our  current  environment  it  is  automatically  updated  by  the  low-level  extended 
RCCL  system  at  user-defined  rates.  Self-knowledge  represents  knowledge  on  the 
network  configuration  and  on  the  state  of  agents,  e.g.  computational  state  (wait¬ 
ing  for  a  message,  waiting  for  a  processor),  execution  time,  period  ei  cetera  (this 
information  is  used  by  the  CPUX  scheduler).  Agent-specific  knowledge  is  con¬ 
stituted  by  messages  explicitly  produced  and  consumed  by  agents  in  order  to 
cooperate  or  interact.  Starting  from  the  world  knowledge,  each  agent  can  create 
local  world  models,  to  be  used  in  a  shared  or  private  way.  Thus,  this  architecture 
is  particularly  suited  to  the  design  and  development  of  hybrid  robot  architec- 
i  ires.  However,  it  does  not  exclude  the  possibility  of  experimenting  behavioral 
approaches,  since  the  notion  of  world  generalizes  a  concept  of  world  only  com¬ 
prising  external  physical  sensory  input. 

CPUX  process  interaction  mechanisms  support  agent  interaction  models  sim¬ 
ilar  to  the  generative  communication  of  Linda,  and  are  expressed  by  the  following 
primitive  predicates: 

-  Borld  -?  Bisg(N) 

like  in,  withdraws  the  message  from  the  world  with  possible  suspension 

-  vorld  -??  msgCN) 

like  rd,  reads  the  message  from  the  world ; 

-  world  -!  nsg(m) 

like  oul,  adds  the  message  to  the  world  . 

These  mechanisms  can  express  both  cooperative  and  competitive  agent  inter¬ 
action  schemes.  However,  more  specific  mechanisms  and  concepts  can  be  required 
in  robot  applications  and  the  environment  accordingly  extended. 

For  example,  in  full  competitive  interaction  schemes,  the  global  control-law 
has  to  emerge  from  a  collection  of  different,  independent,  local  laws.  In  order 
to  achieve  such  a  behavior,  some  (meta)  control  level  or  decisor  component  can 


503 


be  introduced.  Once  the  designer  has  chosen  how  to  implement  the  decision 
system,  a  new  abstraction  should  be  made  available  in  the  system.  Such  a  new 
communication  abstraction  can  be  implemented  in  CPUX  as  shown  in  [15].  In 
fact,  CPU  processes  are  abstractions  of  independent  coroutines  which  can  be 
explicitly  transfer  control  each  other  and  all  CPUX  process  interaction  operators 
are  themselves  written  in  Prolog. 

5.2  Agent  Configuration  and  Design 

Robot  applications  demand  for  a  wide  spectrum  of  agent  types  ranging  from 
purely  deductive  goal-directed  activities,  to  completely  behavior-based  ones. 

Since  CPUX  support  both  declarative  and  procedural  programming  styles, 
such  a  spectrum  of  agents  classes  can  be  easily  expressed.  Contextual  program¬ 
ming  provides  a  powerful  tool  to  dynamically  compose  agents  as  a  hierarchy  of 
different  knowledge  units  (open  or  closed  to  variations).  In  fact,  through  the  ex¬ 
tension  operator  U  »  G,  goal  G  is  solved  in  a  new  context  obtained  by  adding  U 
clauses  on  top  of  current  context.  In  this  way,  an  agent  like  a  classical  STRIPS- 
like  planning  system  can  be  dynamically  configured  by  creating  a  context  com¬ 
posed  of  a  unit  implementing  the  algorithm  core  with  a  unit  containing  domain 
specific  status  and  operators  descriptions. 

In  several  domains  some  agents  have  to  be  defined  as  periodic  activities 
possibly  with  fixed  deadline.  Such  a  class  of  agents  can  be  implemented  in  CPUX 
thanks  to  the  possibility  of  handling  asinchronous  events  (timing  interrupts). 
Moreover,  suitable  environment  tools  can  be  defined  to  help  programmers  during 
the  design  phase  to  check  for  the  existence  of  a  feasible  schedule.  At  this  level, 
the  typical  techniques  of  constraint  logic  programming  can  be  exploited. 

In  order  to  increase  modularity  and  flexibility,  a  class  of  dynamically  config¬ 
urable  agents  can  be  required  in  order  to  merge  goal-oriented  decision  procedures 
with  the  opportunistic  decisions  related  to  the  current  state  of  the  world.  Again, 
contextual  mechanisms  can  constitute  the  basic  support.  For  example: 

agent-B 

world  -??  s ituat ion ( Current ) , 

;  Get  global  knowledge  for  the  recognized  situation 

B-default  »  Current  »  g. 
agent-B 

B-default  »  g. 

Agent  with  goal  agent-B  evolves  in  dependence  of  a  specific  recognized  sit¬ 
uation,  if  any,  generated  by  another  “perceptual”  agent.  In  performing  activity 
g,  it  takes  into  account  the  generic  B-default  assumptions  only  if  more  specific 
and  relevant  ones  are  not  available. 

5.3  Meta  Control 

Another  critical  part  of  multiagent  robot  systems  is  agent  network  configuration, 
which  can  be  estabilished  both  in  a  static  way,  as  a  consequence  of  user  design 


504 


or  as  a  planning  activity,  and  in  a  dynamic  way,  as  a  consequence  of  new  user 
commands  or  of  a  meta-control  layer.  Meta-level  programming  features  of  CPUX 
(todemo/odemo  reflection  mechanisms  described  in  [21])  can  be  exploited  to 
allow  users  to  modify  current  behavior  at  two  different  levels: 

1.  at  agent  level  without  modifying  the  overall  architecture  (network  of  agents), 

2.  at  decision  network  level,  without  modifying  agent  local  behavior. 

Assume  for  example  that  agent-l  is  currently  in  charge  of  issuing  com¬ 
mands  to  the  RCCL  agent.  Later,  it  has  to  compete  with  the  decisions  of  a  new 
agent-2.  This  kind  of  dynamic  reconfiguration  can  be  done  without  modify¬ 
ing  agent- 1,  by  dynamically  associating  to  it  a  meta-unit  meta-1  in  which  a 
meta-rule  performs  the  routing  of  to-rccl  message  to  a  decisor  component.  In 
general,  meta-rules  have  the  following  form: 

todemofCurrUnit.AuxReg.CurrProcess,  CurrGoal)  <body>. 

which  means  “to  demonstrate”  the  CurrGoal  within  CurrProceas  perforin 
the  <body>  actions.  These  actions  can  express  different  kinds  of  meta-control 
policies,  such  as: 

-  route  the  message  to  another  agent; 

-  send  another  message; 

-  kill  (another)  process; 

-  monitor  the  agent  behavior; 


6  Prototyping  an  Application 

In  this  section  we  introduce  a  simple  example  of  robotic  application  and  schemat¬ 
ically  show  how  the  integrated  environment  and  its  mechanisms  can  assist  the 
user  in  the  development  process.  In  particular,  only  a  few  relevant  aspects  of  the 
overall  task  are  presented. 

The  application  consists  of  a  task  for  a  robot  manipulator,  combining  the 
exploration  of  an  unknown  workspace  and  the  manipulation  of  blocks.  By  means 
of  its  sonars,  the  robot  has  to  recognize  a  number  of  block  stacks  and  to  obtain, 
via  blocks  manipulation,  a  new  stacks  organization.  The  environment  is  thus  very 
similar  to  a  classical  blocks  world,  even  though  important  extensions  that  can  be 
handled  include  presence  of  obstacles  and  dynamically  changing  environment. 

Figure  3  shows  a  typical  initial  situation  for  the  simulated  manipulator  task. 
The  robot  in  this  example  is  configured  to  have  two  sonar  range  sensors,  one 
aligned  with  the  gripper  (z-sonar),  the  other  rotated  by  90  degrees  (x-sonar). 

The  problem  requires  safe  hand  navigation  in  an  unknown  workspace,  while 
perceptual  activities  attempt  to  determine  the  configuration  of  blocks  in  the 
workspace.  For  sake  of  simplicity  some  conditions  are  assumed  along  with  some 
a-priori  knowledge:  the  workspace  is  monodimensional  and  obstacles  are  higher 
than  any  block  stack,  since  they  are  actually  containers  of  blocks. 


505 


Fig.  3.  An  example  of  initial  workspace 


Fig.  4.  Safe  navigation  over  the  blocks 


Agent  follow-height  drives  the  robot  hand  safely  through  the  workspace, 
adjusting  the  direction  of  the  trajectory  so  as  to  mantain  the  hand  over  the 
blocks.  Agent  detect-obstacle  monitors  the  front  sonar  to  detect  the  presence 
of  obstacles  ahead.  When  some  obstacle  is  perceived,  detect-obstacle  sends  a 
message  to  the  world  informing  other  agents  that  a  container  is  either  approach¬ 
ing  or  getting  far.  The  perceptual  interpretation  of  the  workspace,  which  in  this 
case  aims  at  obtaining  a  world  representation  in  terms  of  block  stacks,  is  a  good 
example  of  contextual  agents.  In  fact,  the  perceive-blocks  agent  derives  the 
configuration  of  blocks  from  z-sonar  data  differently  in  context  no-container 
than  it  does  in  context  over-container,  on  account  of  the  presence  of  the  bot¬ 
tom. 

Finally,  as  soon  as  perception  has  taken  place,  planning  activity  can  be  car¬ 
ried  out  while  reactive  navigation  can  be  stopped.  Agent  strips  is  triggered 
with  a  context  blocks  to  synthesize  the  sequence  of  robot  actions  which  can 
reach  the  user-defined  goal  state.  The  list  of  operators  is  then  sent  to  the  world, 


506 


where  an  execute  agent  transforms  this  high-level  description  into  movement 
messages  for  the  RCCL  agent. 

7  Conclusions 

The  paper  has  presented  the  organization  of  an  environment  for  prototyping  and 
experimenting  advanced  robotic  architectures.  The  environment  is  built  upon 
the  integration  of  a  real-time  robotic  machine  with  a  full-fledged  logic  system. 
Among  the  distinguishing  features  of  the  environment  are. 

-  possibility  of  simulation  of  sensor-driven  tasks,  exercising  a  real-time  con¬ 
troller  which  is  identical  to  that  of  the  physical  robot, 

~  general  mechanisms  for  agent  interaction  which  may  implement  various  ar¬ 
chitectural  paradigms  currently  investigated  for  intelligent  robots, 

-  efficiency  in  rapid  prototyping  and  in  exploration  of  architectural  alterna¬ 
tives,  due  to  the  underlying  logic-based  environment  enriched  with  concur¬ 
rency,  modularity,  and  contexts. 

In  this  work  we  focused  our  attention  on  mechanisms  rather  than  on  well 
founded  robot  programming  models,  since  defining  such  models  is  precisely  the 
goal  of  several  research  efforts  in  robotics.  Contextual  mechanisms  and  low- 
level  process  control  primitives  have  been  exploited  to  provide  a  flexible  and 
extendible  basic  architecture,  supporting  both  robot-level  and  task-level  pro¬ 
grams  built  according  to  prototyping  techniques.  Since  the  same  mechanisms 
can  be  used  to  build  the  robot  programming  environment,  the  application  layer 
can  grow  up  in  a  synergetic  and  uniform  way  with  its  own  user-support. 

While  these  mechanisms  proved  suitable  for  structuring  purposes,  a  still  open 
issue  in  this  approach  is  the  degree  of  real-time  the  symbolic  level  of  the  system 
is  to  be  affected. 

References 

[1]  Albus  J.,  McCain  H.,  Lumia  R.:  NASA/NBS  Standard  Reference  Model  Telerobot 
Control  System  Architecture,  NIST  Tech.  Note  1235,  NIST,  Gaithersburg,  MD, 
July  1987. 

[2]  Crowley  J.  L.:  Coordination  of  Action  and  Perception  in  a  Surveillance  Robot, 
IEEE  Expert,  Vol.  2(4),  Winter  1987. 

[3]  Brooks  R.  A.:  A  Robust  Layered  Control  System  for  a  Mobile  Robot,  IEEE  Journal 
of  Robotics  and  Automation,  Vol.  7,  Vol.  RA-2,  No.  1,  1986. 

[4]  Mataric  M.  J.:  Integration  of  Representation  Into  Goal-Driven  Behavior-Based 
Robots,  IEEE  Trans,  on  Robotics  and  Automation,  Vol. 8,  No.  9,  September  1992. 

[5]  Gatt  E.:  Integrating  Reaction  and  Planning  in  a  Heterogeneous  Asynchronous  Ar¬ 
chitecture  for  Mobile  Robot  Navigation,  Proc.  of  AAAI  Symposium  on  Integrated 
Intelligent  Architectures,  ACM  Sigart  Bulletin,  Vol.  2.  No.  4,  August  1991. 

[6]  Lyons  D.  M.,  Hendriks  A.J.;  Planning  for  Reactive  Robot  Behavior,  IEEE  Int. 
Conf.  on  Robotics  and  Automation,  Nice,  France,  May  1992. 


507 


[7]  Miller  D.  J.,  Lennox  R.  C.;  An  Object-Oriented  Environment  for  Robot  System 
Architectures,  IEEE  Int.  Conf.  on  Robotics  and  Automation,  Cincinnati,  OH,  May 
1990. 

[8]  Fagg  A.H.,  Lewis,  M.A.,  Ibetall,T.,  Bekey  G.A.:  R^AD:  Rapid  Robotics  Applica¬ 
tion  Development  Environment,  Int.  Conf.  on  Robotics  and  Automation,  Sacra¬ 
mento,  CA,  April  1991. 

[9]  Chen  C.X.,  Trivedi  M.M.,  Bidlack  C.R.:  Simulation  and  Graphical  Interface  for 
Programming  and  Visualization  of  Sensor-Based  Robot  Operation,  Int.  Conf.  on 
Robotics  and  Automation,  Nice,  France,  May  1992. 

[10]  C.  Bonivento,  E.  Faldella,  G.  Vassura,  “The  University  of  Bologna  Robotic  Hand 
Project'.  Current  State  and  Future  Developments”,  ICAR’91,  International  Con¬ 
ference  on  Advanced  Robotics,  Pisa,  Italy,  June  1991. 

[11]  Caselli  S.,  Faldella  E.,  Zanichelli  F.:  Grasp  Synthesis  for  a  Dextrous  Robotic  Hand 
Using  a  Multi-Layer  Perceptron,  IEEE  Melecon  ’91,  Ljubljana,  YU,  May  1991. 

[12]  Hayward  V.,  Paul  R.  P.:  Robot  Manipulator  Control  under  Unix  -  RCCL:  A  Robot 
Control  “C”  Library,  Int.  Journal  of  Robotics  Research,  Vol.  5,  No.  4,  1986. 

[13]  Mello  P.,  Natali  A.:  Programs  as  Collections  of  Communicating  Prolog  Units, 
ESOP-86,  LNCS  n.  219,  Springer  Verlag  1986. 

[14]  Gelernter  D.:  Generative  Communication  in  Linda,  ACM  Trans,  on  Programming 
Languages  and  Systems,  Vol.  7,  No.  1,  January  1985. 

[15]  Mello  P.,  Natali  A.;  Extending  Prolog  with  Modularity,  Concurrency,  and 
Metarules,  New  Generation  Computing,  Vol.  10,  No.  4,  August  1992. 

[16]  Monteiro  L.,  Porto  A.:  Contextual  Logic  Programming,  Proc.  6th  ICLP,  Lisbon, 
Portugal,  The  MIT  Press,  1989. 

[17]  ALPES  Esprit  Project  n.  P973.  Final  Technical  Report,  September  1989. 

[18]  Lozano- Perez  T.:  Robot  Programming,  Proc.  of  the  IEEE,  Vol.  71,  No.  7,  1983. 

[19]  Roy  P.  V,  Despain  A.  M.;  High  Performance  Logic  Programming  with  the  Aquar¬ 
ius  Prolog  Compiler,  Computer,  Vol.  25,  No  1,  January  1992. 

[20]  Brooks  R.  A.,  “The  Behavior  Language:  User’s  Guide”,  AI  Lab  Memo  No.  1227, 
AprU  1990. 

[21]  Cavaiieri  M.,  Lamma  E.,  Mello  P.,  Natali  A.:  Meta-Programming  through  di- 
re'-t  introspection;  a  comparison  with  meta-interpretation  techniques,  in  “Meta- 
Programming  in  Logic  Programming”,  The  MIT  Press,  Cambridge,  MA,  Abram¬ 
son  and  Rogers  eds,  1989. 


On  the  Complexity  of  the  Instance  Checking  Problem 
in  Concept  Languages  with  Existential  Quantification* 

Andrea  Schaerf 

Dipartimento  di  Informatica  e  Sistcmistica 
Universita  di  Roma  “la  Sapienza” 

Via  Salaria  113,  00198  Roma,  Italia 
e-mail:  aschaerf@aissi.ing.uniromal  .it 


Abstract.  Most  of  the  work  regarding  complexity  results  for  concept 
languages  consider  subsumption  as  the  prototypical  inference.  However, 
when  concept  languages  are  used  for  building  knowledge  bases  including 
assertions  on  individuals,  the  basic  deductive  service  of  the  knowledge  base 
is  the  so-called  instance  checking,  which  is  the  problem  of  checking  if  an 
individual  is  an  instance  of  a  given  concept.  We  consider  a  particular  con¬ 
cept  language,  called  ACS  and  we  address  the  question  of  whether  instance 
checking  can  be  really  solved  by  means  of  subsumption  algorithms  in  this 
language.  Therefore,  we  indirectly  ask  whether  considering  subsumption  as 
the  prototypical  inference  is  justified.  Our  analysis,  carried  out  considering 
two  different  measure  of  complexity,  shows  that  in  ACS  instance  checking 
is  strictly  harder  than  subsumption.  This  result  singles  out  a  new  source  of 
complexity  in  concept  languages,  which  does  not  show  up  when  checking 
subsumption  between  concepts. 

1  Introduction 

Concept  description  languages  (also  called  terminological  languages  or  concept 
languages)  have  been  introduced  with  the  aim  of  providing  a  simple  and  well- 
established  first  order  semantics  to  capture  the  meaning  of  the  most  popular 
features  of  the  structured  representations  of  knowledge.  In  concept  languages, 
concepts  are  used  to  represent  classes  as  sets  of  individuals,  and  roles  are  binary 
relations  used  to  specify  their  properties  or  attributes. 

It  is  a  common  opinion  that  subsumption  checking  (i.e.  checking  whether  a 
concept  represent  necessarily  a  subset  of  the  other)  is  the  central  ^e^lsoning  task  in 
concept  languages.  This  has  motivated  a  large  body  of  research  on  the  problem  of 
subsumption  checking  in  different  concept  languages  (e.g.  [2,  4,  13]).  However,  if 
concept  languages  are  to  be  used  for  building  knowledge  bases  including  assertions 
on  individuals,  the  basic  deductive  service  of  the  knowledge  base  is  the  so-called 

‘This  work  has  been  supported  by  the  ESPRIT  Basic  Research  Action  N.3012-COMPULOG 
and  by  the  Progetto  Finalizzato  Sistemi  Informatici  e  Calcolo  Parallelo  of  the  CNR  (Italian 
Research  Council). 


509 


instance  checking,  which  is  the  problem  of  checking  whether  a  set  of  assertions 
implies  that  a  given  individual  is  an  instance  of  a  given  concept 

In  this  paper  we  address  the  question  of  whether  instance  checking  can  be 
easily  solved  by  means  of  subsumption  algorithms  and  whether  considering  sub¬ 
sumption  as  the  prototypical  inference  is  justified. 

The  outcome  of  the  analysi.s  is  that  the  answer  to  the  question  in  the  general 
Ccise  is  no  and  that  there  are  cases  where  instance  checking  is  strictly  harder  than 
subsumption.  In  fact,  the  instance  checking  problem  for  the  considered  language 
{AC£)  turns  out  to  be  of  higher  complexity  than  the  .subsumption  problem.  This 
result  is  all  the  more  interesting  since  AC£  is  not  some  artificial  language  (just 
defined  for  thew  purpose  of  showing  that  the  complexity  of  subsumption  may 
differ  from  the  one  of  instance  checking)  but  a  rather  natural  language  whhich 
has  been  investigated  before. 

This  result  singles  out  a  new  source  of  complexity  in  concept  languages,  which 
does  not  show  up  when  checking  subsumption  between  concepts.  A  practical 
implication  of  this  fact  is  that  any  actual  deduciioa  procedure  for  reasoning  on 
structured  knowledge  representation  systems  cannot  be  based  solely  on  subsump¬ 
tion  checking,  but  has  to  embed  some  reasoning  mechanisms  that  are  not  easily 
reducible  to  subsumption.  This  fact  must  be  considered  when  designing  the  de¬ 
ductive  services  of  a  system  for  the  development  of  applications  based  on  concept 
languages. 

In  our  analysis  we  have  assumed  we  are  dealin^  with  complete  reasoners.On 
the  contrary,  most  of  the  existing  systems  (e  g.  Loom[9],  Classic[11],  Rack[12]), 
use  incomplete  reasoners*  (except  for  Al172J5(1])  and  in  these  iystems  a  careful 
analysis  of  the  relationship  between  subsumption  and  instance  checking  is  lacking. 
Anyway,  in  our  opinion  (supported  also  by  oth  researchers,  t.g.  see  [13]),  the 
study  of  complete  methods  gi^-'s  more  insight  on  the  structure  of  the  problem 
and  it  is  useful  also  for  the  development  of  better  incomplete  reasoners. 

Furthermore,  we  measure  the  complexity  of  instance  checking  by  the  mea¬ 
sures  suggested  in  [14].  Data  complcitiy  (i.e.  the  complexii^  w.r.t.  the  size  of 
the  knowledge  base)  and  combined  complexity  (i.e.  w.r.t.  both  the  size  of  the 
knowledge  base  and  the  size  of  the  concept  representing  the  query)  ^ .  Another 
result  of  our  analysis  is  that  there  are  cases  where  the  complexity  obtained  using 
the  two  measures  differs  substantially.  This  result  proves  that  the  distinction  is 
necessary  and  must  be  taken  into  account  in  order  to  understand  the  behavior 
of  the  system  in  practical  ceises. 

The  paper  is  organ'"  '  iS  follows.  In  Section  2  we  provide  some  preliminaries 
on  concept  languages,  .  particular  on  the  language  ACS.  In  Section  3,  we  deal 
with  the  problem  of  instance  checking  in  ACS.  Discussion  and  conclusions  are 
drawn  in  Section  4.  For  the  sake  of  brevity,  some  proofs  are  only  sketched. 


2  The  Language  ACS 

We  consider  the  langurge  ACS  [3,  13],  which  is  an  extension  of  the  basic  concept 
language  TC~  introduced  in  [2].  Besides  the  constructors  of  T C~ ,  ACS  includes 

^  Classic  is  actually  complete,  but  with  respect  of  a  non  standard  s  mantics 
*In  (14]  it  is  also  considered  the  so-called  etfrestion  complexity  (i.e.  the  complexity  w.r  t. 
the  size  of  the  concept  representing  the  query),  but  it  seems  less  interesting  in  our  setting. 


510 


qualified  existential  quantification  on  roles  and  complementation  of  primitive 
concepts. 

Given  an  alphabet  of  primitive  concept  symbols  A  and  an  alphabet  of  role 
symbols  AC£-concepts  (denoted  by  the  letters  C  and  D)  are  built  up  by 
means  of  the  following  syntax  rule  (where  R  denotes  a  role,  that  in  ACC  is 
always  primitive) 

C,  D  — ►  A  I  (primitive  concept) 

T  1  (top) 

±  I  (bottom) 

-<A  I  (primitive  complement) 

C  n  D  I  (intersection) 

'iR.C  I  (universal  quantification) 

3R.C  (qualified  existential  quantification) 

We  have  chosen  the  language  ACC  for  several  reEisons.  On  one  hand,  it  is  rich 
enough  to  express  a  significant  class  of  assertions,  as  we  will  see  in  the  examples. 
In  addition,  it  is  suitable  to  express  concepts  representing  meaningful  queries. 
In  particular,  due  to  the  full  existential  quantification  it  is  possible  to  express 
queries  requiring  some  form  of  navigation  in  the  knowledge  base  through  the 
assertions  on  roles.  For  example,  the  query:  "‘find  the  individuals  who  have  a 
friend  whose  child  is  a  doctor”  can  be  expressed  as  3FRIEfD.3CHILD.Doctor.  On 
the  other  hand,  ACC  avoids  other  expressive  constructors  (such  as  disjunction 
of  concepts)  which  introduce  other  sources  of  complexity  and  whose  treatment  is 
out  of  the  scope  of  this  paper. 

An  interpretation  I  =  (A^,  •^)  consists  of  a  set  (the  domain  of  J)  and  a 
function  (the  interpretation  function  of  I)  that  maps  every  concept  to  a  subset 
of  A^  and  every  role  to  a  subset  of  A^  x  A^  such  that  the  following  equations 
are  satisfied; 

=  A^ 

=  0 

(^Ay  =  A^\A^ 

(C  HD)^  =  C^r\D^ 

('dR.Cf  =  {di  I  Vdz  :  {di ,  d^)  £  R^  -  dz  €  C^) 

{3R.C)^  =  {di  I  3d2  :  (di.  dz)  A  dz  €  C^} 

An  interpretation  J  is  a  model  for  a  concept  C  if  is  nonempty.  A  concept 
is  satisfiable  if  it  has  a  model  and  unsatisfiable  otherwise. 

We  say  C  is  subsumed  by  D,  written  as  C  C  D,  if  C  for  every 
interpretation  I;  and  C  is  equivalent  to  D,  written  as  C  =  D,  if  for 

every  I. 

The  construction  of  knowledge  bases  using  concept  languages  is  realized  by 
permitting  concept  and  role  expressions  to  be  used  in  assertions  on  individu¬ 
als.  Let  O  be  an  alphabet  of  symbols  denoting  individuals,  an  ACC -membership 
assertion  (or  simply  ckssertion)  is  a  statement  of  one  of  the  forms: 

C(a)  or  R(a,b) 


511 


where  C  is  an  >l££-concept,  ft  is  a  role,  and  a,  6  are  individuals  in  O. 

In  order  to  assign  a  meaning  to  the  assertions,  the  interpretation  function 
is  extended  to  individuals  in  such  a  way  that  6  for  each  individual  a  €  O 
and  ^  bi^  if  a  ^  b  (Unique  Name  Assumption).  The  meaning  of  the  above 
assertions  is  now  straightforward:  if  I  =  (A^,  ■^)  is  an  interpretation,  C{a)  is 
satisfied  by  J  if  6  ,  and  R(a,b)  is  satisfied  by  2  if  (a^,  b^)  6  R2  . 

A  set  E  of  w4££r-assertions  is  called  an  AC£  -knowledge  base.  An  interpretation 
J  is  said  to  be  a  model  of  E  if  every  assertion  of  E  is  satisfied  by  2 .  E  is  said  to 
be  saiisfiable  if  it  admits  a  model.  We  say  that  E  logically  implies  an  assertion 
at  (written  E  ^  a)  if  a  is  satisfied  by  every  model  of  E. 

In  the  so-called  terminological  systems,  the  knowledge  base  also  includes 
an  intensional  part,  called  terminology,  expressed  in  terms  of  concept  defini¬ 
tions.  However,  almost  all  implemented  systems  assume  that  such  definitions  are 
acyclic,  i.e.  in  the  definition  of  concept  C  no  reference  (direct  or  indirect)  to  C 
itself  may  occur.  It  is  well  known  that  any  reasoning  process  over  knowledge 
bases  comprising  an  acyclic  terminology  can  be  reduced  to  a  reasoning  process 
over  a  knowledge  base  with  an  empty  terminology,  (see  [10]  for  a  discussion  on 
this  reduction  and  its  complexity).  For  this  reason,  in  our  analysis  we  do  not 
take  into  account  terminologies  and,  therefore,  we  conceive  a  knowledge  base  as 
just  a  set  of  ^££-assertions. 

Definition  2.1  We  call  instance  checking  in  ACS  the  following  problem:  given 
an  ACS -knowledge  base  E,  an  ACS-concept  D,  and  an  individual  a  in  O,  check 
i/  E  ^  D{a)  holds.  Furthermore,  denoting  with  |E|  the  si^e  of  E  and  with  |D| 
the  size  of  D,  we  call  data  complexity  (resp.  combined  complexity^  of  instance 
checking  in  ACS  the  complexity  of  checking  ifT,  ^  D{a)  w.r.i.  |E|  (resp.  |E|  and 

\D\). 

Notice  that  instance  checking  is  a  basic  tool  for  more  complex  services.  For 
example,  using  instance  checking  it  is  possible  to  solve  the  so-called  retrieval 
problem:  given  a  concept  D,  find  all  the  individuals  that  are  instances  of  D. 
Retrieval  can  be  performed  simply  by  iterating  the  instance  checking  problem 
for  all  the  individuals  in  E. 

Example  2.2  Let  Ei  be  the  following  >t££-knowledge  base: 

{ CHILD (john.mrury) ,  (VCHILO.-<Graduate)(john) , 

(FemaleflBCHILD.Graduate)  (nary ) } 

It  is  easy  to  verify  that  Ej  is  satisfiable.  Notice  also  that  the  addition  of  the 
assertion  GraduataCaary)  to  Ei  would  make  the  knowledge  base  unsatisfiable; 
in  fact,  due  to  the  first  two  assertions,  Ei  ^  ->Graduat«(aary).  □ 


3  Instance  checking  in  ACS 

In  [3]  it  is  shown  that  checking  the  subsumption  relation  between  two  ACS- 
concepts  is  an  NP-complete  problem  and  that  checking  the  satisfiability  of  an 
>I££-concept  is  coNP-complete.  With  regard  to  instance  checking,  we  know  that 
it  is  at  least  as  hard  as  subsumption.  In  fact,  subsumption  can  be  reduced  to 


512 


instance  checking  in  the  following  way:  given  two  concepts  C  and  D,  the  sub¬ 
sumption  test  C  Q  D  is  performed  by  checking  whether  the  knowledge  base 
composed  by  the  single  assertion  C(a)  (where  a  is  any  individual)  logically  im¬ 
plies  D{a),  i.e.  checking  if  {C(a)}  D(a)  holds.  This  result  holds  for  instance 
checking  w.r.t.  the  combined  complexity.  In  fact,  the  complexity  of  subsumption 
is  measured  w.r.t.  the  size  of  both  the  candidate  subsumee  and  the  candidate 
subsumer,  which  becomes  respectively  part  of  the  knowledge  base  and  part  of 
the  query. 

With  regard  to  the  data  complexity,  we  know  that  a  concept  C  is  unsatis- 
hable  if  and  only  if  the  knowledge  base  {C(a)}  implies  every  assertion,  i.e.  if 
{C(a)}  B(a)  (where  B  is  any  concept).  It  follows  that  concept  unsatishability 
can  be  reduced  to  instance  checking  w.r.t.  data  complexity  (the  size  of  B  can  be 
obviously  fixed).  Since  concept  satisfiability  is  coNP-complete,  concept  unsat¬ 
isfiability  is  NP-complete.  We  can  therefore  conclude  that  instance  checking  is 
NP-hard  even  w.r.t.  the  data  complexity^. 

We  now  give  a  mote  precise  characterization  of  the  complexity  of  instance 
checking,  w.r.t.  both  the  data  complexity  and  the  combined  complexity. 

We  start  with  a  lower  bound  for  the  data  complexity  of  instance  checking  in 
ACC.  As  shown  above,  instance  checking  in  ACC  is  NP-hard.  We  prove  that  it 
is  coNP-hard  too.  Since  subsumption  in  ACC  is  NP-complete,  a  consequence  of 
this  result  is  that  (assuming  NP  ^  coNP)  instance  checking  for  ACC  is  strictly 
harder  than  subsumption. 

This  unexpected  result  shows  that  the  instance  checking  problem  in  ACC 
suffers  from  a  new  source  of  complexity,  which  does  not  show  up  when  checking 
subsumption  between  A£5-concepts.  This  new  source  of  complexity  is  related 
to  the  use  of  qualified  existential  quantification  in  the  concept  representing  the 
query  that  makes  the  behavior  of  the  individuals  heavily  dependent  on  the  other 
individuals  in  the  knowledge  base.  The  following  example  enlights  this  point. 

Example  3.1  Let  Ej  be  the  following  >I£5-knowledge  base. 

{FRIEID( john.susan) ,  FRIEID(john,petttr}, 

LOVES (susan, peter) ,  LOVES(peter .nary) , 

GraduateCsusan) ,  -<Graduate(aary)}  . 

Consider  now  the  following  assertion 

/?  =  BFRIERD.CGraduatenBLOVES.-iGraduate)  ( john) . 

Asking  whether  E2  ^  means  asking  whether  John  has  a  graduate  friend 
who  loves  a  not  graduate  person.  At  the  first  glance,  since  Susan  and  Peter  are 
the  only  known  friends  of  John,  it  seems  that  the  answer  the  query  is  to  be  found 
by  checking  whether  either  E2  ^  GraduatenBLOVES.-'Graduate (susan) 
or  E2  t=  GraduatenBLOVES.-iGraduatefpater)  is  true. 

Since  E2  t^Graduata(petar) 

it  follows  that  E2  Graduat«n3L0VES.-<Graduate(peter), 


^Notice  that  this  result  implies  the  previous  one.  In  fact,  the  combined  complexity  is  obvi¬ 
ously  higher  or  equal  to  the  data  complexity. 


513 


and  since  £2  W  CLOVES. -<Graduata(su8an) 

it  follows  that  £2  ^  Graduat«nBLOVES.->Graduat*(susan). 

Reasoning  in  this  way  would  lead  to  the  answer  NO.  On  the  contrary,  the  cor¬ 
rect  answer  to  the  query  is  YES,  and  in  order  to  find  it,  one  needs  to  reason  by 
case  analysis.  In  fact,  the  query  asks  if  in  every  model  M  of  £2  there  is  an  individ¬ 
ual,  say  a,  such  that  FRIEID( john.a),  Graduatafa)  and  BLOVES. -•Graduate (a) 
are  true  in  M.  Obviously,  in  every  model  M  of  £2,  either  Graduate  (peter)  or 
-•Graduate (peter)  is  true.  In  the  first  case,  it  is  easy  to  see  that  a  is  simply 
pater  (and  the  not  graduate  person  he  loves  is  uary),  while  in  the  second  case  a 
is  ausan  (and  the  not  graduate  person  she  loves  is  just  peter).  Therefore,  such 
an  individual  a  exists  in  every  model  of  £2,  and  the  query  gets  the  answer  YES. 

Therefore,  even  if  none  of  the  individuals  related  to  the  individual  john 
through  the  role  FRIEID  is  in  the  condition  requested  by  the  query,  it  happens 
that  the  combination  of  the  assertions  on  the  individuals  (susan  and  peter) 
in  the  knowledge  base  is  such  that  in  every  model  one  or  the  other  is  in  that 
condition.  □ 

The  previous  example  shows  that,  in  order  to  answer  to  a  query  involving 
qualified  existential  quantification,  a  sort  of  case  analysts  is  required.  We  now 
show  that  this  kind  of  reasoning  makes  instance  checking  in  ACE  coNP-hard 
w.r.t.  the  data  complexity. 

The  proof  is  based  on  a  reduction  from  a  suitable  variation  of  the  propositional 
satisfiability  problem  (SAT)  to  instance  checking  in  ACE.  We  define  2-f2-CNF 
formula  on  the  alphabet  P,  a  CNF  formula  F  such  that  each  clause  of  F  heis 
exactly  four  literals:  two  positive  and  two  negative  ones,  where  the  propositional 
letters  are  elements  of  F*  U  {true,  false).  Furthermore,  we  call  2-f-2-SAT  the 
problem  of  checking  whether  a  2-1-2-CNF  formula  is  satisfiable.  Using  a  reduction 
from  3-SAT  (here  omitted)  we  derive  the  following  result:  2-(-2-SAT  is  NP- 
complete. 

Given  a  2-1-2-CNF  formula  F  =  Ci  A  C2  A  ■  A  C„,  where  (7,  =  L\^.  V  V 
-'L\_  V  -•f/2_,  we  associate  with  it  an  ^£F-knowledge  base  Ef  and  a  concept  Q 
as  follows.  £jf  has  one  individual  I  for  each  letter  L  in  F,  one  individual  Ci  for 
each  clause  Ci,  one  individual  /  for  the  whole  formula  F,  plus  two  individuals 
true  and  false  for  the  corresponding  propositional  constants.  The  roles  of  Up 
are  Cl,  P\,P2,  Ni,N2,  and  the  only  primitive  concept  is  A. 

Sf  =  {  Cl{f,  Cl),  Cl{f,  C2), . . . ,  Cl{f,  c„), 

f’Uci ,  /}+),  ^2(^1,  (2+),  Ni  (ci ,  /}_),  N2(ci  ,ll_), 

Pl(c„,  /?+),  P2(c„,  /?+),  W,(C„,  /?_),  N2{Cn,q_), 
A{true),->A{ false)  } 

Q  =  3Cl.(3Pi.^An3P2.^An3Ni.An3N2.A). 

Intuitively,  checking  if  Ef  f=  Q{f)  corresponds  to  checking  if  in  every  truth 
assignment  for  F  there  exists  a  clause  whose  positive  literals  are  interpreted  as 
false,  and  whose  negative  literals  are  interpreted  as  true,  i.e.  a  clause  that  is  not 
satisfied. 


514 


Lemma  3.2  A  2+2-CNF  formula  F  is  unsaiisfiable  if  and  only  tfl^F  |=  Q{f)- 

“=>”  Suppose  F  is  unsatisfiable.  Consider  any  model  I  of  Ef ,  and  let  6j  be  the 
truth  2issignment  for  F  such  that  6j{l)  =  true  if  and  only  if  l^  ^  ,  for  every 

letter  1.  Since  F  is  unsatisfiable,  there  exists  a  clause  Ci  that  is  not  satisfied  by  8j. 
It  follows  that  all  the  individuals  related  to  c,-  through  the  roles  Pj ,  P2  are  in  the 
extension  of  (-'>1)^  and  all  the  individuals  related  to  Cj  through  the  roles  Ny ,  N2 
are  in  the  extension  of  A^ .  Thus  cf  €  (BPi.-^A  D  3P2,->A  n  3Ni.A  D  3N2.A)^ , 
and  consequently  .  Therefore,  since  this  argument  holds  for  every  model 

I  of  Ef,  we  can  conclude  that  Ef  ^  Q{f)- 

“<=”  Suppose  F  is  satisfiable,  and  let  6  a  truth  assignment  satisfying  F.  Let  Js 
be  the  interpretation  for  Ejc  defined  as  follows: 

•  A^‘  =  {l^‘  I  6(1)  =  true), 

•  p^‘  =  {ia^> ,  6^')  I  A>(«.  b)  e  Er}  for  p  =  Cl,  P, ,  Pj,  JV, ,  ATj. 

It  is  easy  to  see  that  It  is  a  model  of  Ef  and  /^*  ^  .  Therefore,  we  can 

conclude  that  Ef  Q(f)  □ 

Theorem  3.3  Instance  checking  tn  ACC  is  coNP-hard  tn  the  size  of  the  knowl¬ 
edge  base. 

Proof.  The  thesis  follows  from  Lemma  3.2  and  from  the  fact  that,  given  a  2+2- 
CNF  formula  F,  Q  is  fixed  independently  of  F  and  Er  can  be  computed  in 
polynomial  time  w.r.t.  the  size  of  F.  □ 

The  above  reduction  shows  that  the  coNP-hardness  arises  even  if  the  knowl¬ 
edge  base  is  expressed  using  a  simple  language,  i.e.  in  order  to  obtain  intractabil¬ 
ity  it  is  sufficient  to  enrich  only  the  query  language  with  the  qualified  existential 
quantification,  keeping  a  tractable  assertional  language.  This  result  is  in  contrast 
with  the  result  reported  in  [7]:  in  that  paper,  a  polynomial  instance  checking  al¬ 
gorithm  is  provided  for  an  .4£-knowledge  base  (AC  extends  PC~  with  primitive 
complements)  using  the  query  language  QC  (which  is  an  extension  of  ACC).  Un¬ 
fortunately,  as  pointed  out  in  [8],  while  that  algorithm  is  sound,  it  is  in  fact  not 
complete.  In  particular,  it  answers  NO  to  the  query  /?  of  Example  3.1. 

Since  the  language  ACC  involves  primitive  complements  and  the  above  re¬ 
duction  makes  use  of  them  it  may  seem  that  the  coNP-hardness  arises  from  the 
interaction  of  qualified  existential  quantification  with  the  primitive  complements. 
On  the  contrary,  we  are  able  to  show  that  the  coNP-hardness  is  caused  by  the 
qualified  existential  quantification  alone.  In  fact,  if  we  consider  the  language 
TCC~  (T C~  +  qualified  existential  quantification),  we  are  able  to  prove  that 
instance  checking  in  TCC~  is  coNP-hard  too.  The  intuition  is  that  in  TCC~  it  is 
possible  to  require  a  reasoning  by  case  analysis  too.  In  particular,  this  is  done  by 
considering  two  -concepts  of  the  form  3fl.T  and  VfJ.C  (instead  of  4  and 

-<A  in  ACC),  and  exploiting  the  fact  that  their  interpretation  covers  the  entire 
domain,  i.e.  3R.T  UVR.C  ~  T. 

In  the  following  we  give  an  upper  bound  to  the  data  complexity  of  instance 
checking  in  ACC,  proving  that  it  is  in 

*The  class  Oj,  also  denoted  by  coNP^^ ,  consists  of  the  problems  whose  complement  can 
be  solved  by  a  nondeterministic  polynomial  time  algorithm  exploiting  a  nondeterministic  poly- 
nomial  oracle.  For  a  discussion  on  the  classes  Ej,  flj^,  and  Aj  see  for  example  [6]. 


515 


Lemma  3.4  Lei  L  be  a  aaiisfiable  ACS-knowhdge  base,  a  be  an  individual,  and 
D  be  a  AC£-concepi.  Then  checking  t/  £  L>(a)  can  be  done  by  a  non- 

deterministic  algorithm,  polynomial  in  |£|,  which  exploits  a  nondeierministic 
polynomial  oracle. 

Proof,  (sketch)  We  know  that  E  ^  D(a)  if  and  only  if  the  knowledge  base 
E  U  {->D(a)}  is  unsatisfiable.  In  general,  -<D  is  not  an  .4£f-concept,  since  it 
contains  the  complement  of  a  non-primitive  concept.  Therefore,  E  U  is 

not  an  >I££-knowledge  base,  but  a  knowledge  base  of  a  more  expressive  language 
called  ACC  (see  [13])  which  extend  ^££  with  general  complements.  In  [1]  an 
algorithm  is  presented  for  checking  the  satisfiability  of  an  .4££-knowledge  base. 
That  algorithm  consists  of  a  NPTIME  procedure  which  exploits  a  PSPACE  sub¬ 
procedure.  It  is  possible  to  show  that,  in  such  algorithm,  if  the  size  of  the  concept 
which  uses  general  complement  is  fixed,  then  the  subprocedure  runs  in  NPTIME. 
Since  we  measure  the  data  complexity,  the  size  of  ->D  can  be  considered  fixed 
and  the  claim  follows.  □ 

Theorem  3.5  Instance  checking  in  ACE  is  in  II^  w.r.i.  the  data  complexity 

Proof.  Easily  follows  from  Lemma  3.4  □ 

Notice  that  the  complexity  of  instance  checking  w.r.t.  the  data  complexity 
is  not  completely  placed  in  the  complexity  hierarchy.  We  conjecture  that  it  is 
n^-complete. 

We  now  show  that  instance  checking  in  ACE  is  PSPACE-complete  w.r.t. 
combined  complexity.  The  following  lemma  states  the  PSPACE-hardness  of  the 
problem  and  its  long  proof  can  be  found  in  [5] 

Lemma  3.6  Instance  checking  in  ACE  is  PSPACE-hard  w.r.t.  the  combined 
complexity. 

Theorem  3.7  Instance  checking  in  ACE  is  PSPACE-complete  w.r.t.  the  com¬ 
bined  complexity. 

Proof.  Lemma  3.6  states  the  PSPACE-hardness  Since  instance  checking  in  ACE 
is  obviously  easier  than  the  PSPACE-complete  problem  of  instance  checking  in 
ACC  (see  [1]),  it  follows  that  instance  checking  in  ACE  is  also  in  PSPACE.  Hence 
instance  checking  in  ACE  is  PSPACE-complete.  □ 


4  Discussion  and  Conclusions 

The  following  table  summarizes  the  complexity  of  instance  checking  in  ACE  with 
respect  to  both  knowledge  base  complexity  and  combined  complexity,  together 
with  the  previous  known  result  regarding  subsumption. 


subsumption 

instance  checking 
data  complexity 

instance  checking 
combined  complexity 

NP-complete 

[3] 

NP-hard 
coNP-hard 
in  n? 

PSPACE-complete 

516 


These  results  single  out  several  interesting  properties: 

1.  Instance  checking  is  not  polynomiaily  reducible  to  subsumption,  in  the 
general  case.  As  a  consequence,  an  algorithm  for  instance  checking  based 
on  subsumption  (and  classification),  should  include  some  other  complex 
reasoning  mechanisms. 

2.  The  data  complexity  and  the  combined  complexity  of  instance  checking  in 
ALt  are  in  different  classes:  one  in  IIj  and  the  other  in  PSPAC&complete. 
This  fact  highlights  that,  in  order  to  have  an  actual  complexity  measure 
of  the  performance  of  our  systems,  we  must  pay  attention  to  which  is  the 
crucial  data  of  the  application. 

3.  Reasoning  in  AC£  suffers  from  an  additional  source  of  complexity  which 
does  not  show  up  when  checking  subsumption,  even  w.r.t.  the  data  com¬ 
plexity.  This  new  source  of  complexity  is  related  to  the  use  of  qualified 
existential  quantification  in  the  concept  representing  the  query  which  re¬ 
quires  some  sort  on  reasoning  by  case  analysis.  In  particular,  due  to  this 
source  of  complexity  the  instance  checking  problem  (unlike  subsumption) 
cannot  be  solved  by  means  of  one  single  level  of  nondeterministic  choice. 

4.  With  respect  to  the  combined  complexity,  ACC  is  in  the  same  class  as 
more  complex  languages  (e  g.  ACC,  see  [1]).  Therefore,  whenever  the 
expressiveness  of  ACC  is  required,  other  constructors  can  be  added  without 
any  increase  of  the  computational  complexity. 

With  regard  to  point  2,  one  may  object  that  if  the  reasoning  process  under¬ 
lying  this  source  of  complexity  is  not  in  the  intuition  of  the  user,  as  pointed  out 
in  example  3.1,  it  must  be  ruled  out  by  the  reasoner  (and  by  the  semantics).  On 
the  contrary  there  are  cases  in  which  this  kind  of  reasoning  seems  to  agree  with 
the  intuition  and  therefore  it  must  be  taken  into  account. 


Acknowledgements 

I  would  like  to  thank  Francesco  Donini,  Maurizio  Lenzerini,  and  Daniele  Nardi 
for  discussion  that  contributed  to  the  paper  and  Franz  Baader,  Enrico  Franconi, 
and  Marco  Sch2terf  for  many  helpful  comments  on  earlier  drafts.  I  also  acknowl¬ 
edge  Yoav  Shoham  for  his  ospitality  at  the  Computer  Science  Department  of  the 
Stanford  University,  where  part  of  this  research  has  been  developed. 


References 

[1]  F.  Baader  and  B.  Hollunder.  A  terminological  knowledge  representation 
system  with  complete  inference  algorithm.  In  Proc.  of  the  Workshop  on 
Processing  Declarative  Knowledge,  PDK-91,  Lecture  Notqp  in  Artificial  In¬ 
telligence.  Springer- Verlag,  1991. 

[2]  R.  J.  Brachman  and  H.  J.  Levesque.  The  tractability  of  subsumption  in 
frame-based  description  languages.  In  Proc.  of  the  4lh  Nat.  Conf.  on  Arti~ 
ficial  Intelligence  AAAI-84,  1984. 


517 


[3]  F.  M.  Donini,  B.  Hollunder,  M.  Lenzerini,  A.  Marchetti  Spaccamela, 
D.  Nardi,  and  W.  Nutt.  The  complexity  of  existential  quantification  in 
concept  languages.  Artificial  Intelligence,  2-3:309-327,  1992. 

[4]  F.  M.  Donini,  M.  Lenzerini,  D.  Nardi,  and  W.  Nutt.  The  complexity  of 
concept  languages.  In  J.  Allen,  R.  Fikes,  and  E.  Sandewall,  editors,  Proc.  of 
the  2nd  Int.  Conf.  on  Principles  of  Knowledge  Representation  and  Reasoning 
KR-91,  pages  151-162.  Morgan  Kaufmann,  1991. 

[5]  F.  M.  Donini,  M.  Lenzerini,  D.  Nardi,  and  A.  Schaerf.  From  subsumption 
to  instance  checking.  Technical  Report  15.92,  Dipartimento  di  Informatica 
e  Sistemistica,  Universita  di  Roma  “La  Sapienza”,  1992. 

[6]  M.  Garey  and  D.  Johnson.  Computers  and  Intractability — A  guide  to  NP- 
completeness.  W.H.  Freeman  and  Company,  San  Francisco,  1979. 

[7]  M.  Lenzerini  and  A.  Schaerf.  Concept  languages  as  query  languages.  In 
Proc.  of  the  9ih  Nat.  Conf.  on  Artificial  Intelligence  AAAI~91,  1991. 

[8]  M.  Lenzerini  and  A.  Schcierf.  Querying  concept-based  knowledge  bases. 
In  Proc.  of  the  Workshop  on  Processing  Declarative  Knowledge,  PDK-91, 
Lecture  Notes  in  Artificial  Intelligence.  Springer- Verlag,  1991. 

[9]  R.  MacGregor.  Inside  the  LOOM  description  classifier.  SIGART  Bulletin, 
2(3):88-92,  June  1991. 

[10]  B.  Nebel.  Terminological  reasoning  is  inherently  intractable.  Artificial  In¬ 
telligence,  43:235-249,  1990. 

[11]  P.  F.  Patel-Schneider,  D.  McGuiness,  R.  J.  Brachman,  L.  Alperin  Resnick, 
and  A.  Borgida.  The  CLASSIC  knowledge  representatin  system:  Guiding 
principles  and  implementation  rational.  SIGART  Bulletin,  2(3):108-113, 
June  1991. 

[12]  C.  Peltason.  The  BACK  system  -  an  overview.  SIGART  Bulletin,  2{3):114- 
119,  June  1991. 

[13]  M.  Schmidt'SchauB  and  G.  Smolka.  Attributive  concept  descriptions  with 
complements.  Artificial  Intelligence,  48(l):l-26,  1991. 

[14]  M.  Vardi.  The  complexity  of  relational  query  languages.  In  Ifth  ACM  Symp. 
on  Theory  of  Computing,  pages  137-146,  1982. 


Mutual  Knowledge* 


S.  Ambroszkiewicz,** 

Institute  of  Computer  Science,  Polish  Academy  of  Sciences,  01-237  Warsaw,  ul. 
Ordona  21,  POLAND,  E-mail;  sambroszQplearn.bilnet 


Abstract.  The  notions  of  agent’s  behavior  and  fundamental  knowledge 
for  decision  making  are  introduced.  Then  it  is  shown  how  mutual  knowl¬ 
edge,  i.e.  knowledge  of  the  form:  agent  «i  knows  that  agent  12  knows  that 
...  tht  agent  in  knows  that  ...  is  used  by  the  rational  agents.  This  use  is, 
following  Wittgenstein  (1958),  regarded  as  meaning  of  this  knowledge. 


1  Introduction. 

Let  us  suppose  that  there  are  two  agents  who  are  involved  in  a  game.  Each 
of  them  has  its  own  knowledge  about  the  game,  his  opponent  and  generally 
about  his  own  world  that  may  influence  his  decision.  The  situation  is  extremely 
interesting  when  these  two  worlds  overlap.  Then  the  agents  themselves  as  well  as 
their  knowledge,  knowledge  about  the  knowledge,  and  so  on,  become  the  subjects 
of  the  knowledge.  To  illustrate  the  problem  let  us  present  a  story^  told  by  Jerzy 
Los  to  Alfred  Tarski  as  early  as  1959. 

Two  warships,  say  A  and  B,  were  fighting  each  other.  Each  of  the  captains  of  the 
ships  (for  short,  captain  A  and  captain  B)  hired  a  spy  on  shore,  and  paid  him  $1  for 
any  information  of  any  importance  to  the  fight.  By  coincidence,  not  so  rare  in  reality, 
it  happened  that  the  two  spies  were  one  and  the  same  person.  It  also  happened  that 
the  radar  of  ship  A  was  out  of  order.  The  spy  learned  it  and  immediately  sent  to 
captain  B  this  news  and  of  course  got  $1.  Soon  afterwards  he  sent  to  the  captain  A 
the  following  message:  captain  B  knows  that  your  radar  is  out  of  order.  Later  on  he 
sent  to  captain  B:  captain  A  knows  that  you  know  that  his  radar  is  out  of  order.  And 
so  on.  The  problem  is  how  much  money  could  the  spy  collect? 

In  fact  the  problem  the  story  states  is  to  what  stage  of  iteration  the  mutual 
knowledge  delivered  by  the  spy  is  important  to  the  captains.  Could  the  spy  get 
an  infinite  amount  of  money?  The  imporance  means  here  that  the  knowledge 
influences  captain’s  behavior. 

Hence,  the  problem  is  illposed  until  the  war  game  and  patterns  of  behavior 
of  the  captains  are  specified.  So  that  we  must  first  answer  the  following  ques¬ 
tions:  What  are  possible  patterns  of  behavior?  and  How  does  mutual  knowledge 
influence  these  behaviors? 

*  This  work  was  supported  by  KBN  grant  No.210979l01 

**  My  thanks  are  due  to  Professor  J.Los  for  inspiration  and  many  helpful  discussions. 
^  Authorized  by  J.  Los. 


519 


Another  important  problem  that  is  raised  by  the  story  is  meaning  of  the 
knowledge.  Of  course  we  can  build  model-theoretic  or  possible-worlds  semantics 
for  mutual  knowledge,  however  the  story  suggests  much  more  straighforward 
semantics  based  on  the  following  claim: 

One’s  knowledge  is  meaningful  (for  him)  tf  it  may  influence  hts  decision. 
According  to  this  claim  the  question;  Whal  does  it  mean  for  us?  is  equivalent 
to  How  do  we  use  it?  So  that  meaningless  will  be  the  same  ais  useless. 

Actually  this  approach  to  semantics  is  not  novel.  It  is  a  version  of  Wittgen¬ 
stein’s  view: 

43.  For  a  large  class  of  cases  -  though  not  for  all  -  in  which  we  employ  the  word 
"meaning”  it  can  be  defined  thus;  the  meaning  of  a  word  is  its  use  in  the  language. 
And  the  meaning  of  a  name  is  sometimes  explained  by  pointing  to  its  bearer. 

Wittgenstein  (1958),  pp. 20-21 

Since  meaning  is  use  and  the  use  is  to  get  some  goal,  there  must  be  also  a 
criterion  for  reaching  this  goal.  So  that  we  are  very  close  the  theory  of  decision 
making.  In  fact  this  Wittgenstein’s  use  is  nothing  but  playing  so  called  language 
game. 

Let  us  present  an  example  of  simple  language  game  adopted  from  Wittgen¬ 
stein  (1958).  Assume  that  there  are  two  agents;  agent  l(speaker)  and  agent 
2(hearer).  Agent  1  has  the  actions  a\  €  Ai,  whereas  agent  2  the  actions  a,  €  A->. 

The  actions  of  agent  1  are  commands  for  agent  1  to  take  an  appropriate 
action.  Agent  2’s  knowledge  has  the  following  form,  agent  1  tells  me  a',,  lie 
understand  a  command  if  in  response  he  takes  an  appropriate  action.  Behavior 
of  agent  2  may  be  described  as  follows: 

if  agent  1  tells  me  a\  then  I  will  take  one  action  from  the  set  A'^  C  At  Hence 
it  is  behavior  that  constitues  the  meaning  of  that  knowledge.  Let  us  note  that 
this  resembles  somewhat  the  procedural  representation  of  knowledge.  Actually 
procedural  representation  defines  partial  behavior  of  the  agent. 

The  exact  meaning  of  commands  is  coordinated  in  learning  process  that 
consists  in  repetition  of  the  game.  It  is  worth  to  notice  that  mutual  knowledge 
is  important  for  this  process.  We  may  again  cite  Wittgenstein: 

210.  "But  do  you  really  explain  to  the  other  person  what  you  yourself  understand? 
Don’t  you  get  him  to  guess  the  essential  thing?  You  give  him  examples,  -but  he  has 
to  guess  their  drift,  to  guess  your  intention.”-Every  explanation  which  I  can  give 
myself  I  can  give  to  him  too.-”  He  guesses  what  I  am  intend”  would  mean  :  various 
interpretations  of  my  explanation  come  to  his  mind,  and  he  lights  on  one  of  them. 
So  in  this  case  he  should  ask;  and  I  should  answer  him. 

Wittgenstein  (1958),  pp. 83-84 

We  may  also  assume  that  each  agent  has  preferences  on  the  results  of  the 
actions,  however  we  want  here  to  focus  mainly  on  behavior. 

Generally  agent  i’s  behavior  is  defined  as  action  reduction,  say  Bi,  such  that 
given  knowledge,  say  A',  and  set  A,-  of  possible  action.s  of  the  agent  i,  Bi{Ai ,  A') 
is  a  subset  of  At . 


520 


We  will  cissume  that  this  behavior  is  defined  only  for  fundametal  knowledge, 
wherests  other  kinds  of  relevant  knowledge  (i.e.  the  knowledge  that  may  influence 
the  agent  decision)  should  be  transformable  into  the  fundamental  one.  However 
in  order  to  define  explicite  fundamental  knowledge  and  behavior  we  should  spec¬ 
ify  the  situation  like  in  the  Los’  story.  In  the  case  of  decision  making  these  two 
notions  are  relatively  straighforward  defined  (see  the  next  section). 

The  introduction  of  the  notions  of  fundamental  knowledge  and  behavior 
distinguish  this  work  from  the  vast  literature  on  this  subject;  Hintikka  1962, 
McCarthy  1979,  Moore  1977  and  1980,  Konogile  1981,  Fagin  1984,  Kraus  and 
Lehman  1986,  Aiello,  Nardi  and  Schaerf  1988,  Halpern  and  Moses  1990  to  men¬ 
tion  only  few. 

Rational  behavior  (in  the  sense  of  decision  theory)  is  recently  appreciated 
as  complementary  to  logical  inferrence  in  reasoning,  see  Doyle,  Levesque  and 
Moore  1988,  and  Doyle  1990.  In  the  next  sexion  we  will  specify  patterns  of 
rational  behavior  and  war  game  to  give  a  solution  to  the  problem  posed  in  the 
Los’  story. 

As  to  semantical  aspects  of  knowledge  there  is  growing  need  among  A1  com¬ 
munity  for  a  notion  of  meaning  independent  of  those  delivered  by  Tarskian  and 
possible-worldssemantics,  see  for  example  Winograd  and  Flores  1986,  Rosenfield 
1988,  Clancey  1991,  Hewitt  1991  and  Gasser  1991.  It  seems  that  the  approach 
to  semantics  based  on  Wittgenstein’s  idea  is  promising  to  fit  out  this  need.  In 
the  last  section  of  the  paper  we  will  try  to  build  such  semantics  just  by  showing 
how  an  agent  uses  his  mutual  knowledge. 

Actually  similar  idea  was  invented  in  so  called  Social  Conceptions  of  Knowl¬ 
edge  and  Actions  Gasser  (1991)  and  Open  Information  Systems  Semantics  He¬ 
witt  (1991),  let  us  quote  from  Gasser  (1991): 

The  notion  that  the  meaning  of  a  message  is  the  response  it  generates  in  the  system 
that  receives  it  was  introduced  by  Mead  (1934)  and  was  later  used  independently  in 
the  context  of  computing  by  Hewitt  (1977).  Using  this  conceptualization,  a  message 
that  provokes  no  response  has  no  meaning,  and  each  message  with  impact  has  a 
specific  meaning,  played  out  as  a  set  of  specific  response  behaviors. 

There  the  notion  of  mutual  knowledge  is  crucial  to  define  one  of  their  fun¬ 
damental  notions,  namely  webs  of  commitments  or  systems  commitments;  let  us 
again  quote  from  Gasser  (1991): 


This  approach  rests  on  a  somewhat  untraditional  idea  of  what  an  agent  is:  in  Gerson 
formulation,  an  agent  is  a  reflexive  collection  of  processes  involved  in  many  situations. 
To  varying  degrees,  the  agent  -  that  is,  some  component  process  of  the  agent  -  can 
take  the  viewpoint  of  any  participant  in  those  situations. 

Hence,  the  notions  of  agent’s  behavior  and  fundamental  knowledge,  intro¬ 
duced  in  this  paper,  that  serve  to  show  how  the  agent  uses  his  mutual  knowledge, 
seem  to  be  important. 


521 


2  Patterns  of  rational  behavior. 

General  form  of  mutual  knowledge  is  the  following; 

agent  ii  knows  that  ...  agent  i*  that  Ek  takes  place,  (1) 

where  Et  is  some  event  and  k  =  This  is  called  agent  i’s  knowledge  of 

depth  n. 

Generally,  by  agent’s  behavior  we  mean  transformation  that  given  a  knowl¬ 
edge,  it  reduces  the  agent’s  actions.  Any  knowledge  that  may  cause  this  reduction 
will  be  called  relevant,  so  that  we  allow  Et  in  (1)  to  be  any  event  that  may  be 
called  relevant,  even  such  peculiar  one  as  spots  on  the  Sun,  if  it  is  only  shown 
how  this  influences  the  decision  of  the  agent. 

Let  us  consider  the  simplest  yet  not  trivial  decision  problem:  two-person  non- 
cooperative  game  in  normal  form;  G  =  (Ai,  A2,  U),  uj),  where  A,  is  the  set 
of  all  possible  actions  of  agent  i,  and  u,  :  Ai  x  Ao  R  is  agent  i’s  payoff.  The 
game  is  played  as  follows:  the  agents  take  actions  ai,a2  recpectively,  then  they 
receive  respectively  payoff's  ui (01,02)  and  «2(‘ii.“2)  It  is  supposed  that  each  of 
them  wants  to  maximize  his  payoff. 

In  our  opinion  in  game  context  there  are  only  two  kinds  of  relevant  knowledge 
which  are  fundamental.  The  first  one  is  about  the  payoff,  whereas  the  second 
one  is  about  the  opponent’s  actions.  The  rest  of  relevant  knowledge,  whatever 
it  may  be,  must  be  transformable  onto  these  two  kinds.  The  motivaton  behind 
this  is  that  the  agent’s  behavior  to  be  defined  below  depends  only  on  these  two 
kinds  of  knowledge,  i.e.  what  agent  takes  into  account  for  his  final  decision  is 
his  payoff  and  possible  actions  of  the  opponent  because  his  payoff  depends  on 
action  taken  by  the  opponent. 

For  simplicity  we  eissume  here  that  each  agent  knows  precisely  his  payoff. 

Now  let  us  consider  agent’s  knowledge  about  the  opponent's  actions.  Suppose 
that  the  agent  1  knows  that  in  the  play  his  opponent,  (i.e.  agent  2)  will  choose 
an  action  from  the  set  C  A2.  Let  this  supposition  be  called  deterministic 
(contrary  to  probabilistic  one  to  be  considered  later).  Then  agent  I’s  behavior 
consists  in  reducing  his  set  of  actions  to  the  optimum  ones,  say  A\  C  Ai,  given 
this  supposition.  So  that  we  may  write  A\  =  l>i(Ai,  ui,  /IJ, ),  where  61  is  to  be 
the  behavior  of  agent  1.  This  reduction  61  (called  agent  I’s  behavior)  depends  on 
his  own  payoff  (it  is  supposed  that  the  agent  tries  to  maximize  his  payoff)  and 
the  opponents  actions.  Hence  elements  of  the  set  A\  are  meant  as  the  optimum 
player  I’s  actions  given  this  supposition. 

Another  approach  to  agent’s  behavior  is  based  on  probabilistic  suppositions. 
Assume  that  agent  1  hats  a  belief  on  which  action  will  be  chosen  by  the  opponent. 
In  other  words  he  has  a  priori  probability  distribution,  say  n  on  the  set  A2.  So 
that  for  any  02  G  A2-  /^(az)  is  the  measure  of  his  belief  that  agent  2  will  choose 
the  action  02-  Then  the  behavior  of  agent  1  consists  in  reducing,  on  the  basis 
of  this  probabilistic  supposition  and  his  utility,  his  action  set  to  some  subset 
A\  C  /Ij.  So  that  we  may  write  A'^  =  Bi(Ai,  Ui,pi),  where  B\  is  to  be  the 
behavior  of  agent  1.  Elements  of  A'l  are  meant  as  the  optimum  actions  given  this 


522 


supposition.  If  they  maximize  the  expected  payoff  relatively  to  this  distribuiiot!. 
then  this  behavior  is  called  Bayesian.  However  the  belief  of  agent  1  has  nothing 
to  do  with  the  real  play,  that  is,  the  game  is  played  only  once  and  the  agent 
2’s  choice  in  this  play  is  independent  on  what  that  belief  is.  Since  after  the  play 
agents  will  get  pure  payoffs  not  the  expected  ones,  it  does  not  seem  reasonable 
to  optimize  agent  I’s  utility  relatively  to  /i,  that  is,  to  play  one  of  the  actions 
that  maximize  the  expected  payoff.  This  optimization  is  meaningful  only  in  the 
case  where  the  game  is  repeated  many  times,  but  this  changes  the  situation 
completely. 

Despite  the  criticism  presented  above,  it  seems  that  beliefs  play  important 
role  in  decision  making.  So  far  as  deterministic  supposition  may  be  interpreted 
as  direct  information  on  the  opponent’s  choice,  probabilistic  supposition  may  be 
interpreted  as  beliefs  coming  from  previous  observations  how  often  particular 
actions  have  been  chosen  by  the  opponent  in  similar  situations.  Once  we  assume 
that  agent  has  his  own  belief  before  the  play,  it  is  not  decided  in  advance  that 
the  agent  must  maximize  the  expected  payoff  relatively  to  the  belief. 

Although  those  two  kinds  of  behavior  are  different,  they  do  not  contradict 
each  other,  so  we  may  also  consider  the  third  kind  being  a  mixture  of  them. 

In  order  to  touch  ground,  we  will  specify  some  behaviors.  As  for  the  second 
kind,  we  choose  the  Bayesian  behavior,  because  so  far  it  is  the  only  well  estab¬ 
lished  behavior  based  on  the  probabilistic  suppositions.  Let  p  be  a  probability 
distribution  on  Aj.  For  the  simplicity  we  assume  that  the  letters.  i,j  denote 
different  agents,  i.e.  i  —  1  iff  j  =  2,  and  i  =  2  iff  j  =  1.  Formally  this  behavior 
is  defined  as  follows: 

=  {a;  G  A,  :  a,  is  best  response  to  /i), 
where  a,-  is  best  response  to  p  if  at  maximizes  the  function: 

/(a.)  =  53  «<(ai.*)A‘(*;) 

The  behavior,  we  are  going  to  introduce,  which  is  based  on  deterministic 
suppositions,  is  the  following: 

bl{Ai,Ui,  A'j)  =  {a,  €  A,  :  3aj  G  Aj-  Oi  is  best  response  to  aj  in  A,  }, 
where  Uj  is  best  response  to  Oj  if 

Va- G  A,  :  u,(a<,aj)  >  Ui(a',aj). 

In  words,  it  consists  in  rejecting  (as  not  optimum)  the  actions  that  are  not 
best  responses  to  any  action  from  the  supposition.  This  behavior  can  hardly  be 
considered  as  rational  unless  we  assume  that  the  agent  has  no  risk  aversion.  In 
order  to  see  so  let  us  present  the  following  agent  I’s  payoff  table: 


«1 

“2 

a{ 

0 

3 

3 

0 

«! 

2  +  t 

2  +  t 

where  a*  (resp.  a*)  are  the  actions  of  agent  1  (resp.  agent  2),  and  0  <  (  <  1 
Note  ‘hat  if  (  is  close  to  1  then  the  agent  should  choose  af  that  gives  him  2  +  (  for 
sure,  whereEis  other  actions  might  give  him  a  bit  more,  that  is,  3  however  he  can 
not  be  sure  of  that.  The  action  af  is  not  a  best  response  neither  to  a\  nor  to  o§. 
So  that  if  the  number  3  in  the  table  means  $3  (a  small  amount  of  money)  t..en 
despite  of  the  risk  of  geting  nothing,  it  is  reasonable  to  choose  a  best  response, 
i.e.  a{  or  aj.  Hence  the  extreme  behavior  that  consists  in  elimin„  lon  of  actions 
that  are  not  best  responses  may  be  considered  as  the  behavior  of  rational  agent 
who  is  a  gambler,  i.e.  independently  from  the  payoff  he  has  no  risk  aversion. 
It  is  still  an  open  problem  how  to  measure  risk  aversion  and  how  this  aversion 
influences  the  agent’s  behavior. 

Now  let  us  come  back  to  the  Los’  story.  Assume  that  if  the  radar  is  out  of 
order  then  the  game  G  given  by  the  following  table  is  played; 


G 

bi 

b2 

ba 

ai 

-2,3 

-1,0 

1,0 

a2 

0,0 

1,2 

^0,0 

as 

1,1 

0,2 

0,0 

where  ai,a2,a3  are  the  actions  of  captain  A,  whereas  bi,b2,b3  are  captain  B’s 
actions.  First  (resp.  second)  number  in  the  table  is  payoff  to  the  first  (resp. 
second)  agent. 

Let  us  assume  that  behaviors  of  the  captains  are  6j  and  respectively, 
that  is,  captain  A  takes  place  of  agent  1,  and  captain  B  takes  place  of  agent 
2.  Assume  that  this  fact  was  common  knowledge  among  them  in  the  sense  of 
Aumann  (1976).  Let  us  note  that  the  mutual  knowledge  delivered  by  the  spy 
was  about  the  game  they  played.  In  other  words  the  event  Efc  in  (1)  is  denoted 
cis:  the  game  G  is  played. 

Although  captain  A  did  know,  of  course,  that  his  own  radar  was  out  of  order, 
i.e.  the  game  G  Wcis  played,  he  could  not  eliminate  any  of  his  actions  until  he 
got  to  know  more  about  the  enemy’s  knowledge  and  his  behavior.  When  captain 
B  got  to  know  that  the  game  G  was  played,  according  to  his  rational  behavior, 
he  removed  action  b3  that  is  not  best  response  in  game  G.  When  captain  A  got 
to  know  so,  i.e.  that  captain  B  knew  that  the  game  G  was  played,  knowing  the 
behavior  of  his  opponent  captain  A  also  knew  how  he  had  reduced  his  actions. 
Hence,  on  the  basis  of  this  reasoning,  captain  A  removed  action  ai  that  is  not 
best  response  if  only  action  bs  is  removed.  When,  in  turn,  captain  B  got  to  know 
that  ai  Weis  removed,  he  removed  also  action  bi,  so  that  the  only  action  left  was 
b2.  When  captain  A  got  *o  know  that  his  enemy  took  action  b2,  he,  of  course, 
would  take  action  a2.  Hence,  we  see  that  the  knowledge  delivered  by  the  spy  was 
important  to  the  fight  up  to  the  first  stage  where  all  remaining  actions  were  best 
responses.  So  that  the  spy  could  collect  only  $4. 

This  shows  how  relevant  (mutual)  knowledge  is  transformed  into  fundamental 
knowledge;  in  this  case  it  is  transformation  into  the  knowledge  about  opponent 
actions. 


524 


3  A  semantics  for  mutual  knowledge 

The  aim  of  this  section  is  first  to  present  another  general  multi-agent  decision 
problem  which  confirms  that  the  notions  of  agent’s  behavior  and  fundamental 
knowledge  are  essential,  i.e.  without  these  notions  we  can  not  show  how  agents 
use  their  mutual  knowledge.  Second,  to  stress  again  that  this  use  may  be  regarded 
as  meaning  of  this  knowledge. 

Suppose  that  there  are  two  agents  involved  in  a  competitive  game,  say  G  ~ 
(Ai,  T2,  Ui,  uj)-  Each  of  the  agents  has  a  computer  and  a  software  that  assists 
him  in  decision  making,  fn  other  words  agent  1  has  his  own  computer  system, 
say  S',  .  Let  us  consider  for  example  S\ .  It  works  cis  follows,  given  information  on 
agent  I’s  payoff  and  what  actions  (say,  from  subset  A2  C  A2)  may  be  chosen  by 
the  opponent  (i.e.  agent  2),  system  Si  suggests  agent  1  to  restrict  his  actions  to 
subset  Aj  C  Ai.  Actually  this  system  reflects  the  agent  I’s  behavior,  i.e.  it  is 
an  input-output  system,  such  that  given  fundamental  knowledge  as  an  input,  it 
generates  an  output  being  a  subset  of  Aj. 

We  assume  that  each  agent  has  put  all  his  rules  of  behavior  into  his  system 
so  that  he  relies  on  it  completely.  For  this  reason  we  will  identify  agent's  system 
with  his  behavior. 

For  simplicity  we  assume  that  each  agent  knows  precisely  his  own  payoff  and 
this  knowledge  is  stored  in  his  system,  so  that  all  relevant  knowledge  should  be 
transformable  on  the  knowledge  about  the  opponents  actions. 

Now  we  are  ready  to  present  another  story: 

-  It  happened  that  the  agent  2  got  a  piracy  copy  of  Si,  so  that  runing  it  on  his 
computer  he  got  to  know  the  subset  Aj  C  Ai,  i  e.  the  action  reduction  done  by 
his  opponent.  Then  he  ran  his  own  S2  with  the  knowledge  on  AJ  as  the  input, 
reducing  in  this  way  his  actions  to  the  set  A^- 

-  Suppose  that  the  agent  1  got  to  know  so.  However  this  knowledge  was  irrelevant 
to  him  until  he  has  known  how  the  opponent  had  used  the  copy  of  his  system, 
i.e.  until  he  has  known  the  set  A2  which,  in  turn,  he  could  compute  only  if  he 
got  a  cooy  of  S2.  So  suppose  that  he  managed  to  get  a  piracy  copy  of  52- 

-  Then  he  was  able  to  simulate  the  reasoning  of  his  opponent,  getting  the  set  A2. 
Next  he  applied  his  own  system  Si  to  A2  reducing  his  actions  to  the  set  Aj, 

-  Suppose  that  agent  2  got  to  know  that  his  opponent  (agent  1)  had  a  copy  of  his 
system  and  that  agent  1  knew  that  agent  2  had  a  copy  of  5i .  Then  he  simulated 
the  reasoning  of  of  agent  1,  so  that  he  learned  the  set  Aj.  Later  on  with  this  set 
as  input  he  tan  his  system  (i.e.  52)  reducing  in  this  way  his  actions  to  the  set 
A^ 

-  And  so  on. 

Let  us  analize  the  story  in  detail.  In  this  story  the  mutual  knowledge  is  about 
agents’  systems,  i.e,  about  theirs  behaviors  and  at  the  same  time  about  the  game 
played.  In  Los  story  the  mutual  knowledge  was  only  about  the  game,  so  that 
in  order  to  get  a  solution  we  had  to  assume  that  the  specified  behaviors  of  the 
captains  were  intuitive  common  knowledge.  Each  of  the  items  above  represents 
some  agent’s  mutual  knowledge,  and  shows  how  the  agent  uses  it.  Let  us  put 
down  it  explicite.  This  knowledge  may  be  listed  as  follow..; 


525 


(1) 

(2) 

(3) 


(a)  agent  2 

(a)  agent  1 

(b)  agent  1 

(a)  agent  2 

(b)  agent  2 
And  so  on. 


knows  the  system  Si. 

knows  that  agent  2  knows  system  Si, 

knows  system  S2- 

knows  that  agent  1  knows  that  agent  2  knows  system 
knows  that  agent  1  knows  system  82- 


Si, 


It  is  extremely  interesting  here  that  each  of  the  mutual  knowledge  alone  (i  .e. 
either  (a)  about  system  Si  or  (b)  about  system  S2)  is  important  for  the  agent 
only  at  depth  1,  but  together  they  may  be  useful  also  at  depth  greater  than  1. 

It  is  worth  to  notice  that  after  getting  a  piracy  copy  of  Si,  the  agent  2’s 
system  is  no  longer  S2  but  a  new  system,  say  S,,  defined  as  follows: 


S2  =  S2  +  a  copy  of  Si 


The  knowledge  about  Si  is  transformed,  just  by  runing  system  Sj ,  into  tlie 
knowledge  about  agent  I’s  actions,  which,  in  turn,  is  used  as  the  input  for  St  to 
restrict  agent  2’s  actions. 

So  that  agent  1,  after  getting  the  copy  of  S2  and  knowledge  that  his  opponent 
has  already  a  copy  of  his  system,  he  in  fact  knows  the  new  system  of  agent  1,  i.e. 
the  system  S^-  Runing  it  on  his  computer,  he  simulates  the  reasoning  of  agent 
2,  so  that  he  gets  to  know  the  action  reduction  done  by  agent  2.  This  knowledge 
is,  in  turn,  an  input  to  Si.  Therefore  now  a  new  system  of  the  agent,  say  S} , 
consists  of  the  original  one  (i.e.  Si)  plus  S2. 

Hence,  after  each  step  of  knowledge  iteration,  actually,  a  new  system  is  cre¬ 
ated  that  consists  of  the  original  one  enriched  by  the  current  system  of  the 
opponent. 

In  this  way  (see  the  picture  below)  we  get  a  nested  sequence  of  systems 
resulting  in  use  of  the  mutual  knowledge  (a)  of  depth  n  and  (b)  of  depth  n  —  1, 


These  systems  transform  mutual  knowledge  into  the  fundamental  one.  We 
claim  that  these  nested  systems  represent  the  meaning  of  the  mutual  knowledge. 

4  Conclusion 

The  notions  of  behavior  and  fundamental  knowledge  are  crucial  for  the  semantics 
of  multi-agents  systems  proposed  in  the  paper.  It  would  be  interesting  to  find 


526 


out  generaJ  definitions  for  these  notions. 

For  the  sake  of  simplicity  we  have  not  present  complete  formal  description 
of  the  nested  systems. 

The  cases  considered  above  are  very  special,  that  is,  they  are  in  fact  special 
examples  of  decision  problems  where  agents  know  precisely  the  system  of  the 
opponent.  It  is  not  clear  how  to  build  general  framework  even  for  decision  prob¬ 
lems  with  incomplete  mutual  knowledge  as  well  as  for  multi-agent  systems  with 
cooperation  and  communication. 

It  seems  the  Wittgenstain’s  idea  is  far  from  being  explored.  Especially  it  is 
interesting  how  to  define  such  semantics  for  complex  notions. 

Finally  we  regard  mutual  knowledge  to  be  extremely  important  in  learning 
and  generally  in  multi-agent  systems. 


References 

1.  L.  Aiello,  D.  Nardi  and  M.  Schaefer:  Yet  another  solution  to  the  three  wisemen 
puzzle.  In;  Proc,  of  ISMIS’88,  Z.  Ras  and  L.  Saitta  (Eds.)  (1988)  398-407 

2.  R.  J.  Aumann:  Agreeing  to  Disagree.  Annals  of  Statistics  4  (1976)  1236-1239 

3.  W.J.  Clancey;  Book  Reviewof  ref.5.  Artificial  Intelligence  50  (1991)  241-284 

4.  J.  Doyle,  H.J.  Levesque  and  R.C.  Moore:  Panel:  Logicality  vs.  Rationality.  In: 
Proc.  of  the  Second  Conf.  on  Theoretical  Aspects  of  Reasoning  about  Knowledge. 
Morgan  Kaufmann  (1988)  343-364 

5.  J.  Doyle;  Rationality  and  its  Roles  in  Reasoning.  In:  Proc.  AAAl-90  (1990)  1093- 
1100 

6.  R.  Fagin,  J.  Halpern  and  M.Y.  Vardi:  A  model-theoretic  anedysis  of  knowledge:  pre¬ 
liminary  report.  In:  Proc.  IEEE  Symposium  on  Foundations  of  Computer  Science. 
(1984)  268-278 

7.  L.  Gasser;  Social  conception  of  knowledge  and  action:  DAI  foundations  and  open 
system  semantics.  Artificial  Intelligence  47  (1991)  107-138 

8.  E.  M.  Gerson:  Scientific  work  and  social  worlds.  Knowledge  4  (1977)  357-377 

9.  J.Y.  Halpern  and  Y.O.  Moses:  Knowledge  and  Common  Knowledge  in  a  Dis¬ 
tributed  Enviroment.  J.  ACM  37(3)  (1990)  549-587 

10.  C.  Hewitt:  Viewing  control  structures  as  patterns  of  passing  messages,  Artif.  Intell. 
8  (1977)  323-364 

11.  C.  Hewitt;  Open  Information  Systems  Semantics  for  Distributed  Artificial  Intelli¬ 
gence.  Artificial  Intelligence  47  (1991)  79-106 

12.  J.  Hintikka:  Knowledge  and  Belief.  Cornell  University  Press,  Ithaca,  New  York, 
(1962) 

13.  K.  Konolige;  A  first-order  formalization  of  Knowledge  and  action  for  a  multiagent 
planning  system.  Machine  Intelligence  10  (1981) 

14.  S.  Kraus  and  D.  Lehmann:  Knowledge,  Belief  and  Time.  In;  Proc.  13th  ICALP  (L. 
Kott,  ed.)  Rennes,  Springer,  LNCS226  (1986)  186-195 

15.  J.  McCarthy:  First-order  theories  of  individual  concepts  of  propositions.  Machine 
Intelligence  9  (1979)  120-147 

16.  G.  H.  Mead:  Mind,  Self  and  Society.  University  of  Chcago  Press,  IL,  (1934) 

17.  R.  Moore;  Reasoning  about  Knowledge  and  Action.  In:  Proc.  of  IJCAl-83  (1983) 
382-384 


18.  R.  Moore:  Reasoning  about  Knowledge  and  Action.  SRI  Int.  Technical  Note  337, 
Menlo  Park,  CA  (1984) 

19.  I.  Rosenfield:  The  Ivention  of  Memory:  A  New  View  of  the  Brain.  Basic  Books, 
New  York  (1988) 

20.  T.  Winograd  and  F.  Flores:  Understanding  Computers  and  Cognition.  Albex,  Nor¬ 
wood,  NY  (1986) 

21.  L.  Wittgenstein.  Philosophical  Investigations.  Basil  Blackwell,  Oxford  (1958) 


Expressive  Extensions  to  Inheritance  Networks* 


Krishnaprasad  Thirunarayan 

Department  of  Computet  Science  and  Engineering 
Wright  State  University,  Dayton,  OH  45435. 
Email:  tkprasad@cs.wright.edu 


Abstract.  Even  though  much  work  has  gone  into  explicating  the  seman¬ 
tics  of  inheritance  networks,  no  consensus  seems  to  have  emerged  among 
the  researchers  about  their  precise  semantics.  In  fact,  there  are  several 
different  possible  interpretations  of  the  same  network  topology.  So,  from 
a  practical  knowledge  representation  standpoint,  in  the  absence  of  any 
independently  verifiable  semantics,  we  wish  to  pursue  the  approach  of 
enhancing  the  language  of  traditional  inheritance  networks  to  enable 
the  user  to  choose  among  the  various  available  options,  to  program  in  a 
more  complete  description  of  the  input  problem  in  the  enriched  language. 
This  approach  permits  representation  of  certain  networks  that  were  not 
representable  previously,  and  allows  making  subtle  distinctions  among 
networks  that  were  not  hitherto  possible.  In  this  paper  we  propose  an 
annotated  inheritance  network  language,  develop  its  formal  semantics 
by  amcdgamating  harmoniously  a  family  of  related  “local”  inheritance 
theories,  and  discuss  some  implementation  issues. 


1  Introduction 

Inheritance  networks  (abbreviated  «is  INs  in  the  sequel)  form  an  important  class 
of  data  structures  for  knowledge  representation.  Much  work  has  gone  into  expli¬ 
cating  their  semantics  as  evidenced  by  the  contemporary  literature.  (See  [1],  [3], 
[5],  [6],  [7],  [8],  [12],  [13],  [15],  [17],  [19],  [21].)  However,  no  consensus  seems  to 
have  emerged  among  the  researchers  about  the  precise  semantics  of  INs.  That  be¬ 
ing  the  case,  several  families  of  related  inheritance  theories  have  been  proposed, 
each  justified  on  the  basis  of  intuitive  examples.  In  other  words,  INs  have  been 
given  several  different  semantics  based  on  different  possible  interpretations  of 
the  same  topology.  A  closer  scrutiny  of  these  issues  reveals  that  the  differences 
among  some  of  these  interpretations  stem  from  the  various  choices  that  have 
been  n  ade  about  the  available  options  [16],  [22].  So,  from  a  practical  knowledge 
representation  standpoint,  in  the  absence  of  an  independently  verifiable  seman¬ 
tics  [12],  it  makes  sense  to  enhance  the  language  of  INs  to  make  explicit  some  of 
these  available  choices.  With  this  enhancement,  the  user  can  program  in  a  more 
complete  description  of  the  input  problem  using  the  enriched  language  —  espe¬ 
cially  the  information  that  might  have  been  unknowingly  glossed  over  because 
of  a  less  expressive  language. 

*  This  research  was  supported  in  part  by  the  NSF  grant  IRI-9009537. 


Our  goal  in  this  paper  is  very  modest.  We  do  not  wish  to  propose  an  en¬ 
tirely  new  semantics  of  inheritance,  but  instead,  extend  the  semantics  given  in 
[8]  to  enable  making  subtle  distinctions  among  networks  that  would  not  have 
been  possible  otherwise.  Thus,  instead  of  having  traditional  networks  and  a 
number  of  different  but  related  inheritance  theories,  we  propose  to  amalgamate 
harmoniously  a  family  of  these  theories  in  order  to  enable  more  complete  and 
satisfactory  representation.  In  other  words,  in  place  of  having  several  different 
monochromatic  pictures  each  having  a  different  color,  we  wish  to  develop  an 
integrated  framework  that  will  permit  painting  many  multi-coloured  pictures. 

In  Section  2,  we  motivate  the  different  expressive  features  to  be  incorporated 
in  the  enhanced  inheritance  networks.  In  Section  3,  we  describe,  in  detail,  the 
formal  syntax  and  semantics  of  annotated  inheritance  networks  (abbreviated  as 
AINs  in  the  sequel).  We  then  briefly  discuss  issues  related  to  efficient  implemen¬ 
tation.  We  summarize  our  conclusions  in  Section  4. 

2  Motivation  for  Expressive  Enhancements 

We  motivate  the  features  supported  by  our  AINs  through  examples. 

A  word  about  the  notation  used  in  Figure  1.  Ordinary  arrows  stand  for  de¬ 
feasible  arcs,  while  the  bold  arrows  stand  for  strict  arcs.  Bidirectional  arrows 
support  contrapositive  reasoning,  while  the  unidirectional  arrows  do  not.  The 
end  of  the  arrow  marked  with  refers  to  the  negation  of  the  property,  while 
the  unmarked  end  is  interpreted  as  referring  to  the  property.  See  Section  3  for  a 
detailed  explanation. 

-  Specificity  Relation:  Inheritance  conflicts  in  networks  that  support  both  mul¬ 
tiple  inheritance  and  exceptions  can  be  resolved  using  class-subclass  rela¬ 
tionships  implicit  in  the  network.  For  example,  typically,  mammals  are  not 
aquatic.  Whales  are  aquatic  mammals.  Given  that  Shamu  is  a  whale,  we  con¬ 
clude  that  Shamu  is  aquatic.  This  is  because  Shamu  being  a  whale  provides 
more  specific  information  than  it  being  a  mammal  does.  See  Figure  1(a). 
There  are  a  number  of  different  semantics  of  INs  that  primarily  differ  in 
the  implicit  specificity  they  infer  from  the  topology  of  the  network.  The 
inheritance  theories  in  [9]  determine  specificity  by  computing  inheritance 
information.  For  example,  if  r(n)  and  ^(n)  hold,  and  q’s  are  p's  and  r’s  are 
-ip’s,  and  r  inherits  q,  then  r  is  more  specific  than  q,  and  hence,  n  inherits 
-■p  via  r  over  p  via  q.  This  notion  of  local  specificity  (denoted  as  -<)  can  bo 
visualized  as  ordering  the  in-arcs  into  a  property  node.  See  Figure  1(b). 

-  Preferential  Inheritance:  Preferential  networks  allow  representation  of  cer¬ 
tain  unambiguous  situations  whose  traditional  representation  resembles  an 
ambiguous  network  [7].  This  extension  can  be  integrated  with  our  theo¬ 
ries  by  imposing  additional  preferential  constraints  on  the  ordering  of  the 
in-arcs  (also  denoted  as  -<)  into  property  nodes.  For  example,  typically, 
undergraduate  students  are  unemployed,  while  undergraduate  teachers  are 
employed.  However,  undergraduate  teaching  assistants  are  employed.  Thus, 


530 


even  though  the  IN  representing  these  facts  resembles  a  Nixon  diamond,  the 
ambiguity  about  the  employment  status  of  undergraduate  teaching  assis¬ 
tants  can  be  resolved  in  favor  of  inheriting  employed  through  undergraduate 
teachers  over  inheriting  unemployed  through  undergraduate  students.  See 
Figure  1(c). 

-  Strict  and  Defeasible  arcs:  It  is  useful  to  distinguish  between  strict  and  de¬ 
feasible  arcs  [6].  For  example,  typically,  native  speakers  of  German  are  not 
born  in  America.  Persons  born  in  Pennsylvania  are  born  in  America.  Given 
that  Hermann  is  both  a  native  speaker  of  German  and  is  born  in  Pennsyl¬ 
vania,  we  can  conclude  that  he  is  born  in  America.  In  this  example,  the 
strict  conclusion  that  Hermann  is  born  in  America  overrides  the  defeasible 
conclusion  that  Hermann  is  not  born  in  America.  See  Figure  1(d). 


Fig.  1.  Examples 


-  Skeptical  vs  Credulous  Interpretation  of  Ambiguity  :  In  the  presence  of  con¬ 
flicting  evidence  supporting  whether  or  not  n  is  a  p,  there  are  potentially 
two  possible  ways  to  interpret  ambiguity.  In  the  credulous  case,  we  arbi¬ 
trarily  pick  one  of  the  two  conclusions  —  p(n)  or  ~‘p{n),  and  proceed  from 
there;  while,  in  the  skeptical  case,  we  eschew  from  drawing  any  conclusion 
about  the  p-ness  of  n.  For  example,  according  to  defendant  Judge  Clarence 
Thomas’s  testimony,  he  was  not  guilty,  while,  according  to  accuser  Profes¬ 
sor  Anita  Hill’s  testimony,  he  wcis  guilty.  Typically,  the  guilty  is  punished, 
while  the  innocent  is  acquitted.  In  this  ambiguous  situation,  the  Senate  vote 
was  used  to  pick  the  “credulous”  conclusion  that  acquitted  Judge  Thomas. 
In  contrast,  consider  the  example  of  a  Chinese  graduate  student  in  a  U.S. 
university.  Typically,  Chinese  students  are  weak  in  English,  while  gradual ° 
students  in  US  universities  are  strong  in  English.  In  the  context  of  deter- 


531 


mining  whether  to  award  a  teaching  assistantship  or  to  award  a  research 
assistantship  to  such  a  student,  we  prefer  to  remain  “skeptical”  in  order  to 
further  investigate  the  person’s  proficiency  in  English,  rather  than  arbitrarily 
choose  an  inappropriate  conclusion. 

-  Symmetric  Arcs;  In  INs,  a  positive  (resp.  negative)  defeasible  arc  from  p 
to  q  propagates  only  those  individuals  to  q  (resp.  ->q)  that  possess  p  pos¬ 
itively  provided  that  they  are  not  exceptional.  We  generalize  these  arcs  to 
support  propagation  of  individuals  that  possess  property  p  negatively,  to  in¬ 
herit  property  q  positively  or  negatively.  This  extension  makes  the  language 
constructs  more  symmetrical,  enabling  representation  of  such  statements  as: 
Unemployed  persons  do  not  pay  taxes.  (See  Figure  1(e).)  Land-dwellers  have 
lungs  and  no  gills.  Aquatic  animals  have  gills  but  no  lungs.  (See  Figure  1(f).) 

-  Contraposition:  A  number  of  defaults  can  be  contraposed  in  the  same  way 
as  strict  rules.  For  example,  typically,  birds  fly,  and,  typically,  non-flying 
objects  are  not  birds.  However,  there  are  cases  where  it  is  counter-intuitive 
to  contrapose  defaults.  For  instance,  if  it  can  be  shown  that  someone  is 
not  guilty  then  one  can  conclude  that  the  same  person  is  not  a  suspect. 
But,  if  someone  is  a  suspect,  it  does  not  follow  that  the  person  is  guilty. 
Similarly,  people  are  normally  not  diabetic,  but  diabetics  are  people.  See 
Figure  l(g)(h). 

As  can  be  observed,  there  is  no  single  formalism  in  the  literature  that  can 
represent  all  these  examples.  The  formalism  we  develop  in  the  sequel  can  satis¬ 
factorily  represent  all  the  examples  given  above  in  a  unified  framework. 


3  Annotated  Inheritance  Networks 


3.1  Language 

We  extend  the  syntax  of  INs  to  AINs  to  incorporate  features  described  above. 


Definition!.  An  annotated  inheritance  network  is  an  ordered  div.  cted  acyclic 
graph  consisting  of 

-  a  set  of  nodes  that  can  be  partitioned  into  two  sets  —  the  set  of  individual 
nodes  I  and  the  set  of  property  nodes  P; 

-  a  boolean  function  cred.skep  :  P  ►  boolean,  which  specifies  whether  a 
credulous  or  a  skeptical  meaning  of  the  node  is  desired; 


-  a  set  of  arcs  A  and  a  labelling  function  arctype:  A  ({ 


±± 


.--t. 


+-(-  + 


-+  -- 


-  for  each  node  p  G  P,  an  asymmetric  preferential  inheritance  relation  ■<p 


-+  — 


}U 


on  its  in-arcs. 


Informally,  for  a  property  node  p,  if  cred_skep(p)  holds  then  we  cissociate  a 
credulous  meaning  with  p,  else  we  associate  a  skeptical  meaning  with  p.  The 
arc  labelled  (resp.  from  node  p  to  node  q  represents  a 

strict  arc  from  p  to  q  (resp.  from  p  to  -iq,  from  -^p  to  q,  from  -"p  to  -•q). 


532 


Similarly,  the  arc  labels  in  — ►}  represent  defeastble  arcs  that  do 

not  support  contraposition,  and  the  arc  labels  in  — ►,  * - ►}  represent 

defeasible  arcs  that  do  support  contraposition.  In  the  literature,  there  are  at  legist 
two  different  interpretations  of  contraposition  with  respect  to  defeasible  arcs.  In 
[15],  the  defeasible  arcs  are  oriented  and  the  forward  conclusions  override  the 
conflicting  contrapositive  conclusions,  whereas,  in  [4],  the  defeasible  arcs  are 
truly  symmetrical  with  respect  to  contraposition.  So,  if  we  subscribe  to  the 
former  view,  the  contraposable  arcs  can  be  oriented  from  bottom  to  top  or  from 
left  to  right.  We  also  require  that,  if  the  source  node  of  an  arc  is  an  individual 
node  in  I,  then  the  arc  is  strict  and  the  arctype  must  be  one  of  or 
The  sign  of  the  source  node  of  the  arcs  labelled  ■<?==>■,  — ►,  — ►,  — •  is 

positive,  and  the  sign  of  the  destination  node  of  the  arcs  labelled 
,  — ►,  ■i— ►,  - -  is  negative.  If  the  sign  is  missing,  it  is  cissumed  to  be  +. 


3.2  Semantics 

We  specify  the  semantics  of  AIN  in  two  parts  —  the  strict  part  and  the  de¬ 
feasible  part.  Note  that  a  strict  conclusion  propagates  through  a  strict  arc  as 
a  strict  conclusion,  while  it  propagates  through  a  defeasible  arc  as  a  defeasible 
conclusion.  In  contrast,  a  defeasible  conclusion  propagates  through  all  arcs  as  a 
defeasible  conclusion. 

Let  r  be  an  AIN  and  x,  y,  ...denote  the  nodes  in  F.  In  the  sequel,  the 
notation  of  the  form  x  z,  x  z  etc.  refers  to  a  path  from  x  to  ;  rather 
than  a  direct  arc.  Whenever  we  want  it  to  refer  to  an  arc  from  x  to  z,  we  will 
say  so  explicitly. 

To  define  inheritance,  we  make  rigorous  the  notion  of  a  supported  path.  In¬ 
formally,  a  path  containing  all  strict  arcs  is  supported.  In  Figure  1(d),  the  strict 
path  from  node  H  to  node  BA  via  node  BP,  is  supported.  However,  not  all  de¬ 
feasible  paths  are  supported.  In  L'igure  1(b),  the  defeasible  path  from  node  N 
to  node  P  via  node  Q,  which  provides  some  evidence  in  support  of  N  possessing 
P,  is  not  supported,  because  it  is  overriden  by  the  conflicting  defeasible  path 
from  node  N  to  node  P  via  node  R.  To  determine  defeasible  paths  and  to  resolve 
conflicts  among  them,  we  formalize  the  auxiliary  notions  of  has.evidence.for  and 
of  defeat.  Finally,  the  definition  of  supports  for  defeasible  paths  deals  with  the 
interpretation  of  ambiguity. 

We  introduce  the  supports  relation,  denoted  as  >,  below. 

Definition  2.  F  t>  x  y  if  one  of  the  following  holds: 

-  (direct)  x  <=^>-  y  is  an  arc. 

-  (forward)  F  t>  x  z,  and  2  y  is  an  arc. 

-  (forward)  T  t>  x  z,  and  2  y  is  an  arc. 

-  (contrapositive)  F  t>  x  2,  and  y  2  is  an  arc. 

-  (contrapositive)  F  >  x  2,  and  y  2  is  an  arc. 

Similarly,  we  can  define  F  >  x  y  (resp.  x  y,  x  y). 


533 


Note  that  even  though  in  the  real  world  there  cannot  exist  an  x  that  possesses 
both  y  and  ->y,  there  can  be  still  be  such  inconsistencies  in  the  input  supplied  by 
the  user  to  the  inheritance  reasoner.  In  such  situations,  we  tolerate  and  localize 
its  effect  [2]  [18]. 

In  order  to  specify  defeasible  conclusions,  we  need  to  explain  conflict  reso¬ 
lution  strategies  for  disambiguation.  Briefly,  we  use  the  following  criteria;  (1) 
A  strict  conclusion  always  overrides  the  corresponding  conflicting  defeasible 
conclusion.  (See  Figure  1(d)  in  Section  2.)  (2)  A  more  specific  defeasible  con¬ 
clusion  overrides  a  corresponding  less  specific  conflicting  defeasible  conclusion. 
(See  Figure  1(a)  in  Section  2.)  (3)  Furthermore,  in  some  approaches,  a  “for¬ 
ward”  defeasible  conclusion  overrides  a  conflicting  “contrapositive”  defeasible 
conclusion  [8]  [15]  [3],  while  in  others,  the  contrapositive  defeasible  arcs  are 
truly  symmetrical  [4]. 

We  explain  informally  the  specificity  relationship  used  in  [9]  to  resolve  con¬ 
flicts  among  defeasible  conclusions.  See  Figure  1(b).  If  r  inherits  q,  then  for  all 
nodes  n,  and  for  all  property  nodes  p  which  are  parents  of  q  and  r,  the  inheri¬ 
tance  of  -'p  by  n  via  r  dominates  over  conflicting  inheritance  of  p  by  n  via  q.  This 
is  denoted  by  making  the  arc  from  q  to  p  -<r  the  arc  from  r  to  p.  Similarly,  if  -’r 
inherits  q,  then  for  all  nodes  n,  and  for  all  property  nodes  p,  which  are  parents 
of  q  and  r,  the  inheritance  of  p  by  n  by  virtue  of  being  a  -ir  dominates  over 
conflicting  inheritance  of  p  by  n  by  virtue  of  being  a  q.  Observe  also  that  the 
specificity  relation  and  the  inheritance  relation  are  defi.ned  mutually  recursively. 

To  generalize  local  specificity,  we  first  formalize  the  defeat  relation.  Figure  2 
shows  all  the  possibilities  that  can  arise  for  the  AIN  (where  -<p  indicates  the 
relative  strength  of  inheritance  of  p  through  the  arcs). 


Fig.  2.  Implicit  Specificity 

Informally,  an  arc  y  z  is  defeated  for  x  (that  possesses  y)  if  there  is  more 
specific  evidence  that  prohibits  x  from  propagating  further  to  node  z  through 
the  arc  y  z.  Formally, 

Definitions.  F  defeats  the  arc  y  z  forx  if  (T  C>  a:  y)  or  (T  t>  r  y) 


534 


and  if  any  one  of  the  following  holds: 

-  (Strict  overrides  defeasible.)  f  t>  x  2. 

-  (Conflict  resolution  using  Specificity,  Preferential  Inheritance.) 

-  there  exists  an  arc  w  2  such  that  P  t>  x  w,  and  furthermore, 

r  >  w  — i  y,  or  y  -<2  rv,  or 

-  there  exists  an  arc  w  2  such  that  P  t>  x  w,  and  furthermore, 

P  >  w  ^  y,  or  y  -<2  w,  or 

-  there  exists  an  arc  w  2  (resp.  w  ^ — «  2)  such  that  P  t>  x  y,  or 
P  >  X  w,  and  furthermore,  (P  O  w  y),  or  (P  C>  w  y),  or 
y  -<2  w\  or 

-  There  exists  an  arc  w  — *  2  (resp.  w  ^  2)  such  that  P  t>  x  w  or 

P  >  X  w,  and  furthermore,  (P  >  w  4=^  y),  or  (P  >  ui  — i  y),  or 

y  -<i  ti'. 

Similarly,  one  can  define  the  concept  of  defeat  for  other  defeasible  arcs  of  tvpe 
--  +-  --+^  ++^  --  +-^ 

To  define  the  supports  relation  for  defeasible  conclusions,  we  introduce  an 
auxiliary  relation  called  has.evidence.for,  denoted  as  t>.  We  define  defeasible 
conclusions  supported  by  P  due  to  “forward  propagation”  through  defeasible 
arcs  eis  follows. 

Definition  4.  P  >  x  — ►  2  if  any  one  of  the  following  holds: 

-  there  exists  an  undefeated  arc  y  ii*  2  (resp.  y  2)  wrt  x  and  T  >  x  y. 

-  there  exists  an  undefeated  arc  y  2  (resp.  y  2,  y  2)  wrt  x  and 

r  t>  a;  - »  y. 

-  there  exists  an  undefeated  arc  y  — i  z  (resp.  y  2)  wrt  x  and  P  t>  x  y. 

-  there  exists  an  undefeated  arc  y  2  (resp.  y  2,  y  4?^  2)  wrt  x  and 

P  t>  X  ^  j. 

Similarly,  for  the  paths  of  type 

We  also  define  defeasible  conclusions  supported  by  P  due  to  “backward  prop¬ 
agation”  through  contrapositive  arcs  as  follows. 

Definitions.  P  >  x  2  if  it  is  not  the  case  that  {P  t>  x  2)  or  (T  l>  x 
2),  and  if  any  one  of  the  following  holds: 

-  there  exists  an  arc  2  y  and  P  t>  x  y, 

-  there  exists  an  arc  2  y  and  T  >  x  -tl*  y, 

-  there  exists  an  arc  2  y  and  (T  t>  x  y)  or  (T  >  x  y). 

-  there  exists  an  arc  2  y  and  (T  O  x  y)  or  (T  >  x  y). 

Similarly,  for  the  paths  of  type  — ►, 


535 


The  supports  relation  for  defeasible  paths  can  be  defined  now. 

Definitions.  For  skeptical  nodes  p  (that  is,  -’cred.skep(p)  holds): 

(r  >  X  y)  iff  (r  >  X  y)  and  not  {P  t>  x  y) 

{P  t>  X  j/)  iff  (r  0  a;  y)  and  not  (T  t>  ar  y) 

For  credulous  nodes  p  (that  is,  cred-skep(p)  holds): 

(r  C>  a;  ii*  y)  iff  (P  >  x  y)  and  not  (P  t>  x  y) 

{P  >  X  y)  iff  {P  >  X  y)  and  not  (T  >  x  y) 

Similarly,  for 

Definition  7. 

X  inherits  y  iff  (T  C>  x  y)  or  (P  [>  x  y). 

X  inherits  ->y  iff  (T  C>  x  y)  or  (T  t>  x  y). 

Similarly,  for  ->x  inherits  y  and  -'X  inherits  ->y. 

An  expansion  of  AIN  T  is  a  minimal  inherits  relation  containing  arcs  in  /', 

From  a  practical  standpoint,  we  identify  a  rich  subset  of  AlNs  (that  properly 
includes  INs)  for  which  an  expansion  is  “easy”  to  compute.  (For  other  AlNs. 
the  computation  of  an  expansion  may  require  other  book-keeping  overheads 
2is  described  in  the  next  section.)  The  sufficient  condition  we  wish  to  propose 
restricts  the  propagation  of  contrapositive  defeasible  conclusions. 

Definitions.  An  AIN  is  bounded  if  it  does  not  contain  arc  pairs  like  x  y 
and  X  ^  z  (resp.  x  2),  x  y  and  x  z  (resp.  x  z),  x  y  and 
X  z  (resp.  X  z),  x  y  and  x  z  (resp.  x  z),  etc. 

The  following  results  can  also  be  proven  along  the  lines  of  similar  results  shown 
for  INs  in  [9]. 

Theorems.  Every  bounded  AIN  has  an  expansion.  Furthermore,  if  every  node 
in  the  bounded  AIN  is  skeptical,  then  the  AIN  admits  a  unique  expansion. 

3.3  Implementation 

The  above  specification  defines  an  expansion  of  a  bounded  AIN.  To  compute  it 
efficiently  non-monotonic  revision  of  conclusions  should  be  minimized.  That  is, 
we  do  not  aissert  x  inherits  y  if  a  stronger  evidence  for  x  inherits  -ly  can  poten¬ 
tially  be  uncovered.  To  avoid  unnecessary  overheads  we  compute  the  meaning 
monotonically,  by  asserting  only  the  final  conclusions,  and  not  the  intermedi¬ 
ate  revisable  ones.  To  this  end,  we  constrain  the  control  flow  of  the  inheritance 
algorithm  using  the  criteria  to  resolve  conflicts.  In  particular,  we  compute  all 
strict  conclusions  before  any  defetisible  conclusion.  To  compute  defeasible  con¬ 
clusions,  we  process  the  nodes  bottom-up  to  let  more  specific  evidence  dominat*' 
less  specific  ones.  Finally,  we  block  conflicting  contrapositive  conclusions  [15]  [8]. 

A  sketchy  exposition  of  the  inheritance  algorithm  for  bounded  AlNs  follows: 
It  consists  of  two  phases.  The  first  phaise  computes  strict  conclusions,  whil(>. 


536 


the  second  phase  computes  defeasible  conclusions.  This  allows  strict  conclu¬ 
sions  to  override  defeasible  conclusions.  The  first  phase  consists  of  a  number 
of  passes  through  the  AIN  to  propagate  contrapositive  strict  cnclusions.  The 
second  phase  consists  of  only  two  passes  for  bounded  AINs  —  an  bottom-up 
pass  and  a  top-down  pass.  In  the  former,  we  compute  the  “forward”  conclusions, 
while  in  the  latter,  we  compute  the  “contrapositive”  conclusions  [15]  [8].  For 
bounded  AINs,  only  two  passes  are  sufficient  to  compute  an  expansion  because 
the  boundedness  condition  prohibits  indiscriminate  contrapositive  propagation 
[9].  However,  for  arbitrary  AINs,  multiple  passes  may  be  required.  Furthermore, 
this  may  necessitate  nonmonotonic  revision  until  the  meaning  “stabilizes” 

The  computation  of  an  expansion  of  bounded  AIN  requires  polynomial-time. 
The  phase  one  requires  multiple  passes  of  the  AIN  but  the  number  of  passes  is 
bounded  by  the  length  of  the  longest  path  in  the  network.  The  phase  two  can 
be  done  by  generalizing  the  polynomial-time  two-pass  inheritance  algorithm  for 
INs  given  in  [9]. 

4  Discussion  and  Conclusion 

We  have  proposed  expressive  enhancements  to  INs  that  enable  a  knowledge 
engineer  to  make  explicit  information  present  in  the  inpit,  but  that  cannot  be 
represented  using  the  INs.  We  believe  that  the  proposed  extensions  will  benefit 
the  user  because  it  will  permit  the  user  to  decide  what  knowledge  can  be  encoded 
in  the  system,  and  give  the  user  understandable  formal  guarantees  about  the 
quality  of  the  conclusions  that  will  be  generated  [1]. 

In  contrast  with  the  path-based  approaches  discussed  in  [11]  [20],  our  ap¬ 
proach  is  “con elusion” -based.  That  is,  an  expansion  is  a  collection  of  conclusions 
supported  by  the  AIN,  and  not  a  set  of  permitted  paths.  Thus,  we  do  not  have 
the  counterpart  for  floating  conclusions  and  zombie  paths  of  [11],  and  rexnstators 
of  [20]. 

We  now  briefly  touch  upon  some  of  the  examples  given  in  [20].  Consider 
the  wild-chicken  example  of  [20].  Wild-chickens  are  chickens,  and  chickens  are 
birds.  Chickens  have  weak-wings,  while  birds  and  wild-chickens  have  strong- 
wings.  Entities  with  weak-wings  do  not  fly,  while  entities  with  strong-wings  do 
fly.  This  can  be  represented  in  our  formalism  as  shown  below;  {WC  C. 
C  ^  B,  WC  ^  WW,  C  WW,  B  ^  WW,  WW  ^  F,  WW  —  F}. 
Our  theory  supports  the  following  conclusions  which  are  intuitively  satisfactory: 
If  X  is  a  bird,  then  it  has  strong  wings  and  it  flies.  If  x  is  a  chicken,  then  it  has 
weak  wings  and  it  does  not  fly.  If  x  is  a  wild  chicken,  then  it  has  strong  wings 
and  it  flies. 

The  differences  between  the  specificity  espoused  here  and  that  in  [5]  is  de¬ 
scribed  in  [9]  [10].  One  can  further  extend  AINs  to  indicate  disjoint  pairs  of 
nodes;  to  specify  when  the  children  of  a  node  are  exhaustive  [12];  or  to  tag  each 
conclusion  explicitly  with  the  class  name  to  help  in  conflict  resolution. 
Acknowledgement 

I  would  like  to  thank  Yaqiong  Zhang  for  going  through  an  earlier  draft  of  this  paper. 


537 


References 

1.  F.  Bacchus,  A  modest,  but  semantically  well-founded,  inheritance  reasoiier.  In 
IJCAI-S9,  pp.  1104-1109,  1989. 

2.  N.  Belnap,  How  a  computer  should  think,  in:  G.  Ryle  (ed.).  Contemporary  Aspects 
of  Philosophy  (Oriel  Press,  1977)  30-56. 

3.  H.  Geffner,  Defaults  Reasoning:  Causal  and  Conditional  Theories,  Ph.D.  Disser¬ 
tation,  University  of  California  at  Los  Angeles,  1989. 

4.  M.  L.  Ginsberg,  A  local  formalization  of  inheritance:  preliminary  report,  1988. 

5.  J.  Horty,  R.  Thomason,  and  D.  Touretzky,  A  skeptical  theory  of  inheritance  in 
nonmonotonic  semantic  networks.  Artificial  Intelligence,  42  (1990)  311-348 

6.  J.  Horty  and  R.  Thomason,  Mixing  strict  and  defeasible  inheritance,  In  AAAJ-as. 
pp.  427-432,  1988. 

7.  T.  Krishnaprasad,  M.  Kifer,  and  D.  S.  Warren,  On  the  declarative  semantics  of 
inheritance  networks.  In  IJCAl-89,  pp.  1093-1098,  1989. 

8.  Krishnaprasad  Thirunarayan,  The  semantics  of  inheritance  networks,  Ph.D.  Dis¬ 
sertation,  Sta.te  University  of  New  York  at  Stony  Brook,  1989. 

9.  Krishnaprasad  Thirunarayan,  Implementation  of  an  efficient  inheritance  reas  ner. 
Technical  Report  WSU-04-91 ,  Wright  State  University,  1991. 

10.  Krishnaprasad  Thirunarayan  and  M.  Kifer,  A  theory  of  nonmonotonic  inheri¬ 
tance  based  on  annotated  logic.  In  Artificial  Intelligence  60  (1993). 

11.  D.  Makinson  and  K.  Schlechta,  Floating  conclusions  and  zombie  paths:  two  deep 
difficulties  in  the  “directly  skeptical”  approach  to  defeasible  inheritance  nets,  In 
Artificial  Intelligence  48  (1991)  199-209. 

12.  E.  Neufeld,  Defaults  and  probabilities;  extensions  and  coherence,  In  l\R-S7.  pp. 
312-323,  1989. 

13.  L.  Padgham,  Negative  reasoning  using  inheritance.  In  IJCAI-89,  pp.  1086-1092. 
1989. 

14.  J.  Pearl,  Probabilistic  Reasoning  in  Intelligent  Systems,  Morgan  Kaufmann,  1988. 

15.  H.  Przymusinska  and  M.  Gelfond,  Inheritance  hierarchies  and  autoepis.emic  logic, 
University  of  Texas  at  El  Paso,  1989. 

16.  B.  Selman  and  H.J.  Levesque,  The  tractability  of  path-based  inheritance.  In 
IJCAI-89,  pp.  1140-1145,  1989. 

17.  L.  A.  Stein,  A  preference-based  approach  to  inheritance.  Brown  University,  1990. 

18.  R.  Thomason,  J.  Horty,  and  D.  Touretzky,  A  calculus  for  inheritance  in  nionotonic 
semantic  nets.  In  ISMIS-87  pp.  280-287,  1987 

19.  R.  Thomason  and  J.  Horty,  Logics  for  inheritance  theory,  In  .Won-Monolotta 
Reasoning,  M.  Reinfrank  et  al  (eds.),  1989. 

20.  D.  Touretzky,  R.  Thomason,  and  J.  Horty,  A  skeptic’s  menagerie:  conflictors, 
preemptors,  reinstaters,  and  zombies  in  nonmonotonic  inheritance  In  IJLAI-91, 
pp.  478-483,  1991. 

21.  D.  Touretzky,  The  mathematics  of  inheritance  systems,  Morgan  Kaufmann,  1986. 

22.  D.  Touretzky,  J.  Horty,  and  R.  Thomason,  A  clash  of  intuitions:  the  current  stale 
of  nonmonotonic  inultiple  inheritance  systems.  In  IJCAI-87,  pp.  476-482,  1987. 


A  Connectionist-Symbolic  Cognitive  Model 


Guilherme  Bittencourt 

Laboratorio  Associado  de  Computagw  e  Matematica  Aplicada 
Instituto  Nacionai  de  Pesquisas  Espaciais 
Caixa  Postal  515  -  CEP  12.201-097  -  Sao  Jose  dos  Campos  -  SP  -  Brazil 
E)-mail;  inpelac@brfapesp.br 

Abstract.  This  paper  describes  a  formal  model  for  the  cognitive  activity, 
which  is  an  instantiation  of  a  more  general  proposal  for  a  research  line  in  ar¬ 
tificial  intelligence.  The  main  contribution  of  the  paper  is  the  specification  of 
a  wave  propagation  model  that  performs  inference  in  predicate  logic  without 
quantifiers  through  an  interference  mechanism.  The  model  is  highly  parallel, 
and  is  flexible  enough  to  be  extended,  in  a  natural  way,  to  simulate  first-order 
logic,  fuzzy  logic,  four-valued  logic,  and  uncertain  reasoning.  The  model  is 
part  of  an  architecture  integrating  neural  networks  and  symbolic  reasoning 
to  simulate  cognitive  activities. 

1  Introduction 

Artificial  intelligence  could  be  characterized  as  a  collection  of  techniques  adapted  to 
solve  specific  problems,  even  the  fundamental  research  being  divided  among  several 
approaches,  e.g.  the  physical  symbol  systems  model  (Newell  and  Simon,  1976)  and 
the  connectionist  model  (such  as  Rumelhart  and  McClelland,  1986a  and  1986b). 

In  a  special  volume  of  the  Artificial  Intelligence  Journal  on  Foundations  of  Arti¬ 
ficial  Intelligence,  Kirsh  (1991a)  identifies  five  issues  which  have  become  focal  points 
of  debate  in  the  field,  and  serve  as  dividing  lines  of  positions: 

-  Pre-eminence  of  knowledge  and  conceptualization 

-  Disembodiment 

-  Language-like  structure  of  the  kinematics  of  cognition 

-  Possible  independence  of  learning  and  cognit! 

-  Uniform  architecture 

This  paper  describes  a  formal  model  for  the  cognitive  activity,  which  is  an  instan¬ 
tiation  of  a  more  general  proposal  for  a  research  line  in  artificial  intelligence.  The 
arguments  underlying  this  proposal  are  the  following;  (i)  the  only  known  intelligent 
being  is  the  result  of  an  evolutive  process  modelled  by  the  principles  of  natural  se¬ 
lection,  (ii)  the  physical  entity  that  embodies  human  intelligence,  the  brain,  is  built 
out  of  matter  that  can  be  modelled  by  physical  theories. 

Based  on  these  arguments,  we  propose  the  application  of  the  following  two  basic 
principles  to  guide  simulations  of  the  cognitive  activity; 

Principle  1  The  cognitive  activity  should  be  simulated  through  a  model  epistemo¬ 
logically  (McCarthy  and  Hayes,  1969)  compatible  with  the  theory  of  evolution. 


539 


Principle  2  The  cognitive  activity  should  be  based  on  the  interaction  of  a  large 
number  of  functionally  independent  unities  of  a  few  types  (Changeux,  1983).  The 
communication  between  different  unities  should  be  simulated  through  the  emission 
and  reception  of  information  waves. 

According  to  the  above  principles,  the  proposed  research  line  implies  the  foltowing 
positions  with  respect  to  the  ajove  foundational  issues: 

-  Concepts  are  fundamental  to  the  cognitive  capacity  and  take  the  form  of  station¬ 
ary  waves  in  a  medium  characterized  by  a  particular  structure  of  independent 
unities.  These  waves  are  not  always  active,  but  can  be  stored  under  the  form 
of  temporal  sequences  of  specific  wave  frequencies.  Memory  can  be  explained  as 
the  propagation  and  interference  pattern  of  a  set  of  such  waves,  activated  by 
some  input  flow  of  information. 

-  Perception  and  motor  control  are  the  basic  mechanisms  for  the  formation  of 
particular  wave  patterns  which  constitute  a  concept  for  each  cognitive  agent.  The 
apparent  similarity  of  concepts  acquired  by  cognitive  agents  is  a  consequence  of 
the  regularities  of  the  physical  world  and  cultural  interaction,  the  lower  levels  of 
cognition  being  specific  to  each  agent. 

-  Language  is  the  basis  for  the  creation  of  wave  patterns  associated  to  abstract 
concepts.  Without  language  the  cognitive  ativity  is  restricted  to  correlations  of 
different  senses  and  their  significance  to  the  survival  of  the  living  agent. 

-  Learning  and  cognition  are  closely  related.  Between  the  levels  of  sensory  percep¬ 
tion  and  abstract  language  there  are  several  levels  of  complexity.  The  kinematics 
of  cognition  are  similar  in  both  extreme  levels,  but  the  intermediate  levels  are 
particular  to  each  agent  and  built  through  learning  activity  according  to  natural 
selection  mechanisms. 

-  There  is  a  single  architecture  underlying  all  cognition,  but  only  in  the  sense  that 
the  building  blocks  are  similar.  Cognition  is  mostly  a  dynamic  property  of  the 
physical  brain.  The  same  concept  is  stored  in  totally  different  wave  patterns  in 
different  cognitive  agents. 

The  point  of  this  paper  is  to  introduce  a  simple  formal  model  coherent  with  the 
above  arguments.  This  model  intends  to  simulate  a  cognitive  agent  that  commu¬ 
nicates  with  the  external  world,  stores  information  about  the  environment,  makes 
inferences  using  the  available  information,  and  learns  to  generate  adequate  reactions 
according  to  the  state  of  the  environment.  The  cognitive  agent  is  modelled  as  a 
community  of  formal  cognitive  organisms  whose  behavior  is  defined  according  to  the 
natural  selection  principles. 

The  model  associated  to  each  cognitive  organism  consists  of  two  levels:  (i)  the 
categorization  level,  which  corresponds  to  the  communication  process  between  the 
environment,  the  memory  and  the  propagation  level,  and  (ii)  the  propagation  level, 
which  corresponds  to  the  reasoning  process.  The  formalism  used  to  define  the  catego¬ 
rization  level  is  neural  network  theory.  The  propagation  level  is  defined  m  a  theorem 
prover,  based  on  a  wave  propagation  formalism,  acting  on  the  information  provided 
by  the  categorization  level. 

The  main  contribution  of  this  paper  is  the  specification  of  the  wave  propagation 
model  that  performs  inference  in  predicate  logic  without  quantifiers  through  an 


540 


interference  mechanism.  This  model  is  highly  parallel,  and  is  flexible  enough  to  be 
extended,  in  a  natural  way,  to  simulate  first-order  logic,  fuzzy  logic,  four-valued  logic, 
and  uncertain  reasoning.  A  further  contribution  is  the  introduction  of  an  architecture 
integrating  neural  networks  and  symbolic  reasoning  to  simulate  cognitive  activities. 

The  paper  is  organized  as  follows;  In  Section  2,  the  categorization  level  is  sketched. 
In  Section  3,  the  propagation  level  and  its  logical  interpretation  are  defined  and  its 
behavior  is  analized.  Afterwards,  we  describe  a  control  mechanism,  based  on  natural 
selection  principles,  and,  in  Section  5,  we  comment  upon  some  possible  extensions 
to  the  model.  In  Section  6,  the  proposed  model  is  compared  with  some  other  propos¬ 
als  in  the  literature.  Finally,  in  Section  7,  we  present  some  conclusions  and  discuss 
future  directions  for  research. 

2  Categorization 

The  functions  of  the  categorization  level  are;  classification  of  information  received 
from  the  environment,  storage/retrieval  of  information  into/from  the  memory,  and 
trainsmission  of  infornnation  to  the  propagation  level. 

To  formalize  the  environment,  we  supose  the  availability  of  information  patterns 
about  the  world,  and  evaluation  functions.  These  functions  can  be  used  to  evaluate 
a  given  information  vector  according  to  its  usefulness  to  the  cognitive  organism’s 
goals  (Newell  e  Simon,  1976).  More  formally; 

Given  an  arbitrary  environment  E,  and  the  set  of  information  vectors  of  N  bits 
In,  we  define  the  mapping  7; 

y  :E^  In 

This  mapping  corresponds  to  sensory  functions.  For  present  purposes,  we  con¬ 
sider  only  the  information  vectors;  any  further  detail  about  sensory  mechanisms  are 
beyond  the  scope  of  the  paper. 

Let  n  be  the  cardinality  of  the  predicate  set  P,  which  is  used  in  the  propagation 
level,  and  R  be  the  real  numbers.  We  introduce  the  evaluation  function  p: 

p  ;  {InT  ^  R- 

This  function  can  give  an  a  priori  evaluation  to  each  possible  situation,  and  is  a 
model  for  the  adaptation  mechanisms  -  such  as  fear,  hunger,  sexual  attraction,  etc. 

The  communication  between  categorization  and  propagation  levels  is  achieved 
through  the  association  of  each  information  pattern  available  from  the  environment, 
with  an  element  of  some  Herbrand  universe; 

P  :  /w  -  H 

We  define  next  the  function  ^  from  the  set  H  to  the  natural  numbers; 

0  ;  ff  —  N 

This  function  defines  a  possible  order  on  the  Herbrand  universe,  corresponding  to 
an  order  of  the  information  patterns,  and  is  modelled  by  a  neural  network  (Smolen¬ 
sky,  1991)  able  to  associate  a  natural  nunfl)er  to  each  information  pattern.  Initially, 
the  neural  network  classifies  the  information  according  to  the  temporal  order. 


541 


Given  a  vector  tin  In,  this  information  is  transmitted  to  the  propagation  level  in 
the  form  of  n  waves,  one  for  each  predicate  in  P.  The  propagation  space,  according 
to  the  definition  in  the  next  section,  is  an  n-dimensional  space  where  each  axis  is 
aissociated  to  a  characteristic  frequency,  representing  a  predicate.  Each  wave  initiates 
its  propagation  from  a  predicate  axis,  with  the  characteristic  frequency  associated 
to  this  axis.  The  point  on  the  axis  corresponds  to  the  number  associated  to  the 
information  pattern  through  the  mappings  and  //,  i.e.  the  number  ^(/i(/)). 

Let  be  a  set  of  points  in  the  propagation  space  associated  to  a  set  of  n- 
tuples  of  information  vectors  P  =  {tj, ...,  tt),  such  that  53,-  1  P(*t)  I  presents  a  high 
value.  Let  also  the  points  in  W  be  activated  in  a  given  order  with  the  frequencies 
F  =  {fx,...,h}. 

Memory  is  modelled  by  a  set  of  such  frequency  sequences,  classified  according 
to  their  value  for  I  P(*i)  I-  A.®  activation  frequency  of  a  point  depends  on 
the  frequencies  of  the  waves  that  activated  it,  it  is  always  possible  to  determine  the 
predicates  involved  in  the  activation.  Thus,  the  frequency  sequences  contain  all  the 
information  necessary  to  repeat  the  propagation  phenomenon. 

To  minimize  the  propagation  time,  it  is  interesting  that  the  distance  between 
points  associated  to  related  information  vectors  be  small.  This  can  be  achieved 
through  the  use  of  the  memory  information  as  a  learning  feedback  to  the  neural 
network  that  classifies  the  information. 

If  the  information  flow  stops,  e.g.  during  sleep  periods,  the  available  time  can 
be  used  to  compare  and  combine  the  frequency  sequences  in  the  memory.  Without 
environmental  information,  an  arbitrary  internal  time  can  be  imposed  on  the  propa¬ 
gation  mechanism,  and  the  previous  experiences  can  be  repeated  in  a  controlled  way. 
It  is  also  possible  to  refine  the  consequences  of  a  given  frequency  sequence,  without 
the  perturbation  of  the  environment  reactions.  Analogous  frequency  sequences  can 
be  combined  to  form  graphs,  where  different  possibilities  are  associated  to  each  other 
in  a  single  representation  of  possible  choices. 

3  Propagation 

The  propagation  level  is  a  simple  formal  model  to  perform  inference  in  a  predi¬ 
cate  logic  without  quantifiers,  through  a  wave  propagation  mechanism.  The  input 
information  is  delivered  by  the  categorization  level,  and  is  modelled  as  an  ordered 
Herbrand  universe.  The  output  information  is  send  back  to  the  categorization  level, 
and  is  associated  to  reactions  of  the  cognitive  agent. 

Consider  the  following  logical  system.  Let  P  be  a  set  of  n  predicate  symbols, 
and  F  be  a  set  of  function  (and  constant)  symbols.  Let  H  be  the  Herbrand  universe 
constructed  with  the  set  F,  and  C  be  the  set  of  all  closed  logical  clauses  that  can 
be  formed  with  the  predicates  in  P  and  the  expressions  in  the  set  H .  Each  element 
in  C  corresponds  to  a  set  of  closed  literals,  and  is  called  a  clause.  A  clause  can  be 
semanticelly  interpreted  as  a  conjunction  of  literals,  the  usual  representation  adopted 
in  the  literature  of  automatic  theorem  proof,  or  alternatively  as  a  disjunction  of 
literals. 

Given  a  closed  logical  expression,  it  can  always  be  represented  in  two  different 
canonical  forms:  a  disjunction  of  conjuntions  of  litarals  (the  usual  form),  or  a  con- 


542 


junction  of  disjunctions  of  literals.  Each  canonical  form  can  be  represented  as  a  set 
of  elements  of  C,  i.e.  a  set  of  clauses. 

A  clause  is  a  contradiction,  if  it  contains  two  identical  literals  with  different 
signs.  If  the  interpretation  of  the  representation  is  a  disjunction  of  conjunctions, 
the  fact  that  one  clause  is  a  contradiction  implies  that  all  the  representation  is  a 
contradiction.  If  the  interpretation  is  a  conjunction  of  disjunctions,  the  contradictory 
clause  can  simply  be  eliminated. 

A  clause  subsumes  another  clause  if  it  is  a  subset  of  the  other  clause.  In  both 
canonical  representations  clauses  that  are  subsumed  by  other  clauses  can  be  elimi¬ 
nated. 

Given  one  of  the  canonical  representations  of  a  logical  expression,  it  is  possible 
to  obtain  the  other  through  the  distributivity  property  of  the  logical  operators  and 
and  or. 

The  proposed  wave  propagation  model  is  based  in  an  original  logical  result  that, 
given  any  logical  expression  represented  in  the  usual  canonical  form,  the  transforma¬ 
tion  of  this  representation  into  the  other  canonical  form,  followed  by  the  elimination 
of  the  subsumed  clauses,  and  the  reverse  transformation  back  to  the  original  form, 
performs  automatically  all  the  possible  logical  inferences  by  resolution  allowed  by  the 
original  representation.  This  result,  and  its  extension  to  first-order  logic  is  presented 
in  Bittencourt  (1993). 

The  problem  with  this  method  is  that  it  is  computationally  expensive  (distribu¬ 
tivity  is  exponential),  but  if  the  adequate  representation  for  clauses  is  chosen  it  can 
be  performed  in  parallel. 

We  propose  a  representation  that  associates  each  clause  in  a  logical  system  to  a 
point  in  an  n-dimensional  discrete  Euclidean  space.  First,  we  recall  the  definition  of 
the  mapping  ^  from  the  set  H  to  the  natural  numbers; 

<t>:H 

We  also  define  a  total  order  for  the  predicate  set  P. 


We  can  now  define  a  mapping  from  the  clause  set  C  to  n-tuples  of  natural  num¬ 
bers. 


V-  :  C  -»  N" 


V»(Lo  V  ...  V  L„)  =  (zo,  •••,  «n) 


Where,  for  all  x  £  H: 
if  Li  =  P,  (x)  then  z,-  = 
if  Li  =  -'Pii^)  then  z,  =  -i(x), 
if  Li  =  False  then  z,-  =  0,  and 
i  =  Tl(Pi). 


543 


In  this  notation,  the  conversion  from  a  canonical  representation  into  the  other 
can  be  performed  through  simple  vectorial  transformations  of  the  points  representing 
the  clauses.  These  vectorial  transformations  can  be  executed  in  parallel  through  the 
propagation  and  interference  of  information  waves,  if  these  waves  have  the  appro¬ 
priate  frequencies  (which  are  associated  to  the  predicates  present  in  the  clause  that 
originated  the  wave).  Besides  that,  a  subsumed  clause  lies  always  in  a  superspace 
generated  by  the  subsuming  clause  point  along  the  other  dimensions  of  the  predicate 
space.  Using  this  property,  all  subsumed  clauses  can  be  eliminated  by  an  inhibiting 
wave  propagating  from  each  clause  point  in  the  direction  of  its  superspaces. 


4  Control 

The  model  of  the  control  mechanism  is  based  on  the  application  of  the  principles 
of  natural  selection.  These  principles  are  the  folbwing;  (i)  perpetuation  possibility, 
(ii)  limited  resources,  and  (iii)  sufficient  variation  to  generate  evolution.  To  apply 
these  principles,  it  is  necessary  to  identify,  in  the  model,  what  corresponds  to  the 
organisms,  the  environment,  the  resources,  and  the  perpetuation  and  variation  mech¬ 
anisms.  A  formal  cognitive  organism  consists  of  the  interconnection  of  a  propagation 
mechanism,  a  neural  network  classifier,  and  a  memory. 

The  environment  consists  of  the  flow  of  information  provided  by  the  senses  of 
a  cognitive  agent.  This  flow  of  information  is  received  by  a  community  of  cognitive 
organisms.  Each  organism  processes  the  information  and  reacts  according  to  the 
result.  If  the  reactions  are  adapted  to  the  environment,  the  organism  receives  a  good 
evaluation.  This  evaluation  can  be  thought  of  as  the  adaptability  of  the  cognitive 
agent,  for  it  determines  what  are  its  necessities  and  weaknesses. 

The  limited  resource  is  time.  The  cognitive  organisms  that  are  able  lo  generate 
a  relevant  answer  to  a  given  situation  quicker  than  the  others,  are  considered  better 
adapted  to  the  specific  environment.  As  long  as  the  answers  generated  by  the  organ¬ 
isms  are  present  in  the  environment,  they  can  influence  other  organisms,  allowing 
the  possibility  of  evolutive  phenomenona  such  as  symbiosis,  parasitism,  etc. 

The  main  perpetuation  and  variation  mechanism  is  the  exchange  of  experiences 
between  two  independent  cognitive  agents. 

The  natural  selection  process  can  be  simulated,  according  to  the  previous  def¬ 
initions,  by  determining  the  sequence  of  available  information  and  the  evaluation 
function.  The  internal  behavior  of  each  cognitive  organism  corresponds  to  the  defi¬ 
nitions  in  the  previous  sections.  The  total  population  of  cognitive  agents  simulates 
a  cognizer  acting  in  the  real  world. 

5  Extensions 

A  first  possibility  of  extension  to  the  propagation  model  is  to  use  the  geometric 
characteristics  of  the  adopted  notation  to  investigate  the  global  properties  of  a  set  of 
I  clauses.  Topological  properties  of  these  sets  -  such  as  symmetry,  dispersion,  relative 

]  distances,  etc  -  can  be  used  as  a  guide  to  structure  the  memory  information  and 

j  to  classify  the  situations  at  the  categorization  level.  This  can  be  the  basis  for  an 

analogy  reasoning  mechanism. 

1 

I 


544 


Another  possibility  is  to  extend  the  formalism  to  first-order  logic.  Quantified 
expressions  can  be  represented  by  stationary  waves  associated  to  surfaces  in  the 
propagation  space.  These  surfaces  can  be  represented  analitically,  and  their  inter¬ 
ference  can  be  expressed  by  an  algebraic  transformation.  This  fact  makes  the  use  of 
computer  algebra  tools,  e.g.  Hearn  (1987),  interesting  to  analyse  the  wave  behavior. 

The  fact  that  the  model  is  parameter  dependent  allows  a  further  extension,  which 
consists  of  allowing  a  wider  range  of  values  for  each  parameter,  generalizing  the  logic 
concepts  associated  to  them.  For  example,  modification  of  the  propagation  rules 
allowing  contradictory  clauses  generalizes  the  formahsm  to  logics  with  more  than 
two  truth  values  (Belnap,  1977;  Patel-Schneider,  1985).  Continuous  logical  values 
between  false  and  true,  together  with  the  introduction  of  uncertain  informaticsi 
propagation  rules,  allow  the  extension  of  the  formalism  to  fuzzy  logic  (Zadeh,  1975). 

In  the  categorization  model,  it  is  stated  that  the  classification  of  information 
patterns  should  be  modelled  through  neural  networks,  but  the  architecture  of  these 
neural  networks  and  exactly  how  they  connect  to  the  memory  is  yet  to  be  defined. 
The  specification  of  such  architectures  is  a  necessary  extension  to  the  categorization 
model. 

A  further  extension  direction  would  be  to  define  a  categorization  level  based 
on  a  symbolic  approach.  Knowledge  representation  tools,  e.g.  MANTRA  (Bitten- 
court,  1989  and  1990),  are  well  adapted  to  simulate  the  necessary  functions  at  the 
categorization  level. 

6  Discussion 

In  this  section,  we  compare  some  approaches  to  the  problem  of  simulating  cognition 
presented  in  the  literature  with  the  proposed  model.  This  discussion  does  not  intend 
to  be  exaustive,  as  it  covers  only  some  proposals  closely  related  to  the  subject  of 
the  paper.  Detailed  descriptions  of  the  mcthaphors  of  the  mind  can  be  found  in 
Hampden-Turner  (1981),  and  a  discussion  of  the  relationship  of  Artificial  Intelligence 
and  Philosophy  can  be  found  in  Torrance  (1984). 

Presenting  his  view  of  the  “logical  approach”  to  Artificial  Intelligence,  Nilsson 
(1991)  proposes  the  following  three  theses: 

Thesis  1  Intelligent  machines  will  have  knowledge  of  their  environments. 

Thesis  2  The  most  versatile  intelligent  machines  will  represent  much  of  their  knowl¬ 
edge  about  their  environment  declamtively. 

Thesis  3  For  the  most  versatile  machines,  the  language  in  which  declarative  knowl¬ 
edge  is  represented  must  be  at  least  as  expressive  as  first-order  predicate  calculus. 

The  proposed  model  supports  the  first  of  these  theses  with  no  restrictions,  and 
proposes  an  economic  way  of  storing  the  knowledge.  If  this  storing  mechanism  is 
declarative  depends  on  the  interpretation  of  the  word  “declarative” .  What  is  pro¬ 
posed  is  a  representation  closely  related  to  the  acquisition  and  retrival  processes. 
This  representation  stores  information  in  a  structural  form  adapted  to  analogical 
reasoning.  In  this  sense,  it  is  a  declarative  representation.  Although  the  proposed 


545 


formal  model  does  not  cover  first-order  logic,  it  is  possible  to  extend  it  to  first-order 
logic  and  beyond.  Another  intrinsic  characteristic  of  the  model  is  that  it  has  a  tem¬ 
poral  behavior  not  present  in  first-order  logic,  but  fundamental  to  cognitive  agents 
living  in  a  changing  environment. 

In  his  response  to  Nilsson’s  paper,  Birnbaum  (1991)  considers  as  the  main  prob¬ 
lems  of  the  logical  approach  the  emphasis  on  sound,  deductive  inference,  and  the 
tendence  to  ignore  other  sorts  of  reasoning  such  as  probabilistic  reasoning,  reason¬ 
ing  from  examples  or  by  analogy,  and  reasoning  based  on  the  formation  of  faulty 
but  useful  conjectures  and  their  subsequent  elaboration  and  debugging,  besides  the 
presumption  that  model-theoretic  semantics  is  central  to  knowledge  representation. 
The  proposed  model  reconciliates  the  two  positions:  it  presents  the  possibility  of 
probabilistic,  analogical  and  conjectural  reasoning  in  a  framework  defined  from  Her- 
brand’s  Theorem  as  a  starting  point,  precisely  what  Birnbaum  considers  “the  wrong 
central  issue  for  budding  AI  scientists  to  learn” . 

In  his  proposal  of  intelligence  without  representation,  Brooks  (1991)  argues  for 
the  following  points  in  creating  artificial  intelligence: 

-  The  capabilities  of  intelligent  systems  must  be  incrementally  build. 

-  At  each  step,  complete  intelligent  systems  must  be  built  and  let  loose  in  the  real 

world  with  real  sensing  and  real  action. 

From  these  points,  he  concludes  that  using  the  world  as  a  model  is  better  than 
explicit  representations.  This  conclusion,  apparently  in  contradiction  with  the  declar¬ 
ative  representation  approach,  is  nevertheless  supported  by  the  proposed  model,  for 
the  information  waves  used  as  declarative  representation  in  the  proposed  model  are 
increment^dly  built,  and  obtained  directly  from  the  environment  through  interaction, 
and  they  represent  exactly  the  relevant  features  of  the  real  world  that  count  for  the 
cognitive  agent  being  modelled. 

In  his  response  to  Brooks’  paper,  Kirsh  (1991b)  defends  the  necessity  of  concepts 
in  the  cognitive  skills,  and  states  that:  “to  have  a  concept  is  (...)  to  have  the  capacity 
to  find  an  invariance  au:ross  a  range  of  contexts,  and  to  reify  that  invariance  so 
that  it  can  be  combined  with  other  appropriate  invariances.”  This  statement  is  also 
supported  by  the  proposed  model,  as  it  perfectly  describes  the  interaction  of  the 
propagation  and  categorization  levels  of  the  model. 

Pribram  (1971),  a  biology  researcher,  proposes  an  “holographic  metaphor”  that 
presents  many  related  points  with  the  proposed  model.  He  states  that  sensory  in¬ 
put  is  transformed  into  a  brain  wave,  and  that  this  wave  travels  to  an  eirea  of  the 
brain  that  interprets  its  meaning.  The  interpretation  is  a  product  of  various  kinds 
of  standing  memory  waves.  These  waves  travel  across  the  brain  simultaneously  and 
interfere  with  each  other.  The  arguments  that  make  Pribram’s  methaphor  interest¬ 
ing  to  the  biology  community,  even  if  he  does  not  propose  a  precise  mechanism  to 
explain  how  the  methaphor  works,  could  be  used  also  to  defend  the  proposed  model, 
with  the  difference  that  what  is  proposed  is  a  formal  model,  where  the  mechanisms 
are  known.  The  arguments  are  that  the  methaphor  explains:  (i)  the  distribution 
of  memory,  (ii)  the  associational  characteristic  of  the  memory,  (iii)  the  immense 
amount  of  information  stored  in  a  limited  volume,  (iv)  the  mechanism  of  the  brain 
as  an  open  cybernetic  system  of  organism  plus  environment. 


546 


Discussing  alternative  metaphors  for  Artificial  Intelligence,  West  and  Travis  (1991) 
state  three  criteria  that  would  argue  in  favor  of  a  metaphor  for  Artificial  Intelligence: 

-  Suggestiveness:  the  generation  of  referents,  on  both  sides  of  the  metaphoric  re¬ 
lationship,  that  can  be  used  to  confirm  or  dispute  the  validity  of  the  metaphor. 

-  Concreteness:  the  generation  of  practical  avenues  of  research  or  testable  hypothe¬ 
ses. 

-  Consistency;  internally  and  in  relation  to  what  is  known  or  believed  about  the 
mind  and  brain  in  other  discuplines. 

We  would  like  to  conclude  that,  not  the  formal  model  which  is  just  an  attempt 
to  demostrate  the  possibility  of  formalizing  the  proposed  general  approach,  but  this 
general  approach  itself  satisfies,  at  least  partially,  these  three  criteria. 

7  Conclusion 

We  presented  a  unified  framework  to  model  the  cognition  that  integrates  symbolic 
reasoning,  neural  network  theory  and  natural  selection  principles.  This  framework  is 
based  on  the  postulate  that  there  is  no  discontinuity  between  physical,  organic  and 
cognitive  mechanisms  (Lupasco,  1974). 

The  wave  propagation  formalism  of  a  single  cognitive  agent  has  been  implemented 
using  Common  Lisp.  This  system,  with  its  graphical  interface,  is  being  used  to 
study  the  geometric  representations  of  the  states  of  the  propagation  space.  The 
states  where  no  further  inferences  are  possible  are  of  special  interest.  These  “steady 
states"  represent  complete  coherent  states  of  knowledge  that  can  be  reached  from 
several  different  sets  of  initial  pieces  of  information.  Geometric  memipulations  that 
transform  one  of  these  states  into  another  suggest  a  type  of  analogy  mechanism; 
they  map  a  coherent  knowledge  state  into  another  that  is  supported  by  different 
initial  informational  configurations. 

References 

1.  BELNAP,  N.D.,  A  Useful  Four-Valued  Logic.  In  “J.M.  Dunn  fand  G.  Epstein  (Editors), 
Modern  Uses  of  Multiple- Valued  Logics”,  D.  Reidel  Pub.  Co.,  1977. 

2.  BIRNBAUM,  L.,  Rigor  mortis:  a  response  to  Nilsson’s  “Logic  and  Artificial  Intelli- 
gence”.  Artificial  Intelligence  (Special  Volume  Foundations  of  Artificial  Intelligence), 
Vol.  47,  No.  1-3,  pp.  57-77,  January  1991. 

3.  BITTENCOURT,  G.,  A  Hybrid  System  Ardiitecture  and  its  Unified  Semantics.  In 
“Z.W.  Ras  (Editor),  Proceedings  of  The  Fourth  International  Symposium  on  Method¬ 
ologies  for  Intelligent  Systems”,  Charlotte,  North  Caroline,  USA,  October  12-14,  pp. 
150-157,  1989. 

4.  BITTENCOURT,  G.  An  Architecture  for  Hybrid  Knowledge  Representation.  Ph.D. 
Thesis,  Universitat  Karlsruhe,  Deutschland,  31  Januar  1990. 

5.  BITTENCOURT,  G.  Space  Embodiment  and  Natural  Selection.  Internal  Report,  INPE, 
1993. 

6.  BROOKS,  P.A.,  Intelligence  without  Representation.  Artificial  Intelligence  (SpeciaJ 
Volume  Foundations  of  Artificial  Intelligence),  Vol.  47,  No.  1-3,  pp.  139-159,  January 
1991. 


547 


7.  CHANGEUX,  J.-P.,  L’Homme  Neuronal.  Collection  Pluriel,  Librairie  ArthOme  Payard, 

isn. 

8.  HAMPDEN-TURNER,  C.,  Maps  of  the  Mind:  Charts  and  Concepts  of  the  Mind  and 

Labyrinths.  Collier,  New  York,  1981. 

9.  HEARN,  A.C.,  REDUCE  User’s  Manual:  Version  S.S.  RAND  Publication  CPT8,  The 
RAND  Corporation,  Santa  Barbara,  CA,  April  1987. 

10.  KIRSH,  D.,  Foundations  of  AI:  the  Big  Issues.  Artificial  Intelligence  (Special  Volume 
Foundations  of  Artificial  Intelligence),  Vol.  47,  No.  1-3,  pp.  3-30,  January  1991a. 

11.  KIRSH,  D.,  Today  the  Earwig,  Tomorrow  Man  ?  Artificial  Intelligence  (Special  Volume 
Foundations  of  Artificial  Intelligence),  Vol.  47,  No.  1-3,  pp.  161-184,  January  1991b. 

12.  LUPASCO,  S.,  L’energie  et  la  matiere  vivanle.  Antagonisme  constructeur  et  logique  de 
I’heterogene.  Juillard,  Paris,  1974. 

13.  McCarthy,  J.  and  HAYES,  P.J.,  Some  Philosophical  Problems  from  the  Standpoint 
of  Artificial  Intelligence.  In  "D.  Mitchie  and  B.  Meltzer  (Editors),  Machine  Intelligence 
4",  Edimburgh  University  Press,  Eklimbourgh,  GB,  pp,  463-502,  1969. 

14.  MINSKY,  M.L.  and  PAPERT,  S.A.,  Pereeptrons:  An  Introduction  to  Computational 
Geometry.  M.I.T.  Press,  1969. 

15.  NEWELL,  A.  and  SIMON,  H.A.,  Computer  Science  as  Empirical  Inquiry:  Systems  and 
Search.  Communications  of  the  ACM,  Vol.  19,  No.  3,  pp.  113-126,  March  1976. 

16.  NILSSON,  N.J.,  Logic  and  Artificial  Intelligence.  Artificial  Intelligence  (Special  Volume 
Foundations  of  Artificial  Intelligence),  Vol.  47,  No.  1-3,  pp.  31-56,  January  1991. 

17.  NORMAN,  D.A.,  Approaches  to  the  Study  of  Intelligence.  Artificial  Intelligence  (Spe¬ 
cial  Volume  Foundations  of  Artificial  Intelligence),  Vol.  47,  No.  1-3,  pp.  327-346,  Jan¬ 
uary  1991. 

18.  PATEL-SCHNEIDER,  P.F  ,  A  Decidable  First-Order  Logic  for  Knowledge  Represen¬ 
tation.  Proceedings  of  IJCAI  9,  pp.  455-458,  1985. 

19.  PRIBRAM,  K.,  Language  of  the  Brain:  Experimental  Paradoxes  and  Principles  in  Neu¬ 
ropsychology.  Englewood  Cliffs,  N.J.,  Prentice  Hall,  1971. 

20.  RUMELHART,  D.E.  and  McCLELLAND,  J.  (Eklitors),  Parallel  Distributed  Process¬ 
ing:  Explorations  in  the  Microstructure  of  Cognition  1:  Foundations.  M.I.T.  Press, 
Cambridge,  MA,  1986a. 

21.  RUMELHART,  D.E.  and  McCLELLAND,  J.  (Editors),  Parallel  Distributed  Processing: 
Explorations  in  the  Microstructure  of  Cognition  2:  Psychological  and  Biological  Models. 
M.I.T.  Press,  Cambridge,  MA,  1986b. 

22.  SMOLENSKY,  P.,  Tensor  Product  Variable  Binding  and  the  Representation  of  Sym¬ 
bolic  Structures  in  Connectionist  Systems.  Artificial  Intelligence  (Special  Issue  on  Con- 
nectionist  Symbol  Processing),  Vol.  46,  No.  1-2,  pp.  159-216,  January  1991. 

23.  TORRANCE,  S.  (Editor),  The  Mind  and  the  Machine:  Philosophical  Aspects  of  Arti¬ 
ficial  Intelligence.  Ellis  Horwood  series  in  Artificial  Intelligence,  John  Wiley  k  Sons, 
1984. 

24.  WEST,  D.M.  and  TRAVIS,  L.E.,  From  Society  to  Landscape:  Alternative  Metaphors 
for  Artificial  Inelligence.  AI  Magazine,  pp.  69-83,  Summer  1991. 

25.  ZADEH,  L.A.,  Fuzzy  Logic  and  Approximate  Reasoning.  Synthese,  Vol.  30,  pp.  407- 
428,  1975. 


Multi-Context  Systems  as  a  Tool  to  Model 
Temporal  Evolution 


Mauro  Di  Maiizo  and  Enrico  Giunchiglia 

Mechanized  Reasoning  Group 
DIST  -  University  o(  Genoa 
Via  Opera  Pia  llA,  1614S  Genoa,  Italy 
enrico@dist,  unige.it 

Keywords:  Context,  Multi-Context  Syste/ns 


Abstract.  are  <Jefined  as  axiomatic  loriiial  systems.  More  tliaii 

one  context  can  lie  defined,  each  one  modeling/solving  (part  of)  the 
problem.  The  (global)  model/soliition  of  the  problem  is  obtained  mak¬ 
ing  contexts  comniiinicate  via  bruhjt  ruhs.  Bridge  rules  and  contexts  are 
the  coiiipoueitts  of  Multi  Context  systems.  In  this  paper  we  want  to  study 
the  applicability  of  multi  contexts  systems  to  reason  about  temporal  evo¬ 
lution.  The  basic  idea  is  to  associate  a  context  to  each  temporal  interval 
in  which  the  “model”  of  the  problem  docs  not  change  (corresponding  to  a 
state  of  the  system)-  Switch  among  contexts  (corresi>onding  to  modifica¬ 
tions  in  the  model)  are  controlled  via  a  meta-lheoric  context  responsible 
to  keep-track.of  the  tem|ioral  evolution.  In  this  way  (i)  we  keep  a  clear 
distinction  between  the  theory  describing  the  particular  system  at  hand 
anil  the  theory  necessary  for  predicting  the  temporal  evolution  (u)  we 
have  simple  object  level  models  of  the  system  states  and  (in)  the  theo¬ 
rem  prover  can  faster  analize  and  answer  to  ipieries  about  a  particular 
stale.  The  temi>oral  evolution  of  a  (f-tube  is  taken  as  an  example  to  show 
both  the  proposed  framework  and  the  GETFOL  implementation. 


1  Introduction 

Recent  work  in  artificial  intelligence  [GW88,  Giu91,  McC90,  McC91,  Sho91] 
advocates  for  the  use  of  multiple  formal  systems  (called  conitxts)  both  in  knowl¬ 
edge  representation  and  in  problem  solving.  Even  if  there  is  not  a  consensus 
about  how  contexts  should  be  (formally)  rlefined,  there  is  the  common  intuition 
that  many  (reasoning)  phenomena  are  “better”  modelerl  as  a  set  of  interacting 
contexts  each  one  modeling  only  jiart  of  the  whole. 

This  jraper  investigates  the  applicability  of  contexts  to  model  the  temporal 
evolution  of  dynamic  systems.  We  define  a  context  as  an  axiomatic  formal  system 
modeling  what  ts  known  or  assumtd  about  the  system  in  an  instant/interval 
of  time  (a  state  of  the  situation).  The  system  temporal  evolution  is  modelerl 
by  a  tree  of  (temporally  related)  contexts,  called  stule-coniezts  (s-contexts).  A 
particular  metatheoretic  Prohltm  Sohnmj  Context  (PSC)  controls  the  evolution 
of  the  system  and  keeps  track  of  the  temporal  evolution.  PSC'  and  ,s-rontexts  are 


549 


related  via  bridge  rules  (i.t  .  rules  enabling  the  derivation  of  one  fact  in  a  context 
on  the  basis  of  facts  derived  in  other  contexts)  (Giu91] 

Three  are  the  main  advantages  of  the  multi-contextual  approach  we  adopt. 
First  of  all,  we  keep  a  clear  distinction  between  the  theory  describing  the  particu¬ 
lar  system  at  hand  and  the  theory  necessary  for  predicting  the  system  evolution. 
A  different  system  can  be  modeled  simply  by  specifying  its  sii  uciural  description 
in  a  “new”  s-context.  The  Problem  Solving  Context  is  the  same.  Second,  each 
s-context  provides  a  simple  object  level  description  of  a  system  state.  Different 
facts  referring  to  different  states  do  not  cause  contradiction  since  they  are  as¬ 
serted  in  distinct  contexts.  Finally,  when  reasoning  about  a  particular  system 
state  we  may  directly  consider  the  corresponding  .s-context  As  obvious  conse- 
cpience,  the  theorem  prover  performances  are  improved  since  the  search  space 
(in  each  context)  is  smaller. 

We  first  provide  a  theoric  characterization  of  tiie  general  framework  (section 
2).  In  section  3  we  model  the  behavior  of  a  (/-tube  as  an  example  to  illustrate 
both  the  theory  and  the  implementation  built  in  the  GETFOL  system  [GT91] 
Finally,  in  section  4  we  make  some  final  considerations  and  conclude  with  the 
acknowledgements  (section  5). 


2  Reasoning  about  temporal  evolution 

We  define  a  context  C  to  be  an  axiomatic  formal  system  (defined  as  a  triple 
consisting  of  a  language  L,  a  set  of  axioms  QCL  and  a  set  of  inference  rules 
A,  I.t.  ,  C  =  {L,Q,A)).  Besides  the  techiiical/logical  definition  a  context  is 
intended  to  model  what  the  agent  knows  or  aissiimes  that  holds  while  reasoning 
about  a  situation.  The  temjioral  evolution  of  a  given  system  (according  to  the 
agent  mental  image)  is  modeled  by  a  set  of  contexts  each  one  associated  to  a 
particular  state.  Contexts  communicate  via  bridgt  rules:  rules  whose  premises 
and  consequences  belongs  to  different  contexts  [Giu91]. 

Given  a  family  of  context  {Cj  =  Ai))i^i  we  write  (A,  i)  to  specify, 

when  ambiguous,  that  A  is  a  wff  of  language  Li,  (A  is  an  L.-wff)  Adopting  a 
suitable  modification  of  the  natural  deduction  formalism,  notation  and  termi¬ 
nology  as  defined  in  [Pra65],  we  write  bridge  rules  as: 

[(Si./i)] 

(■^  1  I  *  l  •  (^ri »  *ri)  +  i  t  *n  +  l  )  •  •  ■  (^n  +  m  1 


Consistently  with  [Pra65]  notations,  6  represents  a  rule  discharging  the  as¬ 
sumptions  (Bj ,  jj),  ...,  {B 

rn  T  jm)- 

Contexts  and  bridge  rules  are  the  elements  of  multi  context  systems  (MC- 
systems).  Within  MC-systems  we  can  define  a  notion  of  derivation  spanning 
through  multiple  contexts. 


Definition!  Multi-Context  System.  Let  /  be  a  set  of  indices  and  {f,  = 
{Li,Qi,  Ai))i^i,  a  family  of  contexts  A  Multi-Context  Formal  System  (MC  sys¬ 
tem)  MS  is  a  pair  {{Cijig/,  Ams)  where  Ams  is  a  set  of  bridge  rules.  * 

In  a  MC-system,  deductions  are  trees  of  wffs  built  starting  from  a  finite 
number  of  assumptions  and  axioms,  possibly  belonging  to  distinct  languages, 
and  applying  a  finite  number  of  inference  rules.  (A,  i)  is  derivable  from  a  set  of 
wffs  F  in  a  MC  system  MS  (/’  1^^  (A,  i))  if  there  is  a  deduction  with  bottom 
wff  {A,  i)  whose  undischarged  assumptions  are  in  F.  (A,  i)  is  a  tlitorem  in  MS 
(his  *))  if  i®  derivable  from  the  empty  set. 

Any  deduction  can  be  seen  as  composed  of  subdeductions  m  distinct  contexts 
Cj,  (obtained  by  repeated  applications  of  ^^-rules),  any  two  or  more  subdeduc¬ 
tions  being  concatenated  by  one  or  more  applications  of  bridge  rules. 

Even  if  definition  1  is  more  general,  allowing  arbitrary  formal  languages  and 
deductive  machinery,  in  this  paper  we  concentrate  on  first  order  languages.  In 
particular  we  consider  an  ML'-system  MS  composed  by  a  countable  family  of 
contexts  (the  s-contexts)  and  a  Problem  Solving  Context  (PSC)  whicli  axioma- 
tizes  the  metatheory  necessary  to  solve  the  problem.  Thus,  in  this  paper  PSC 
contains  the  basic  principles  for  modeling  the  evolution  of  dynamic  systems.  The 
basic  idea  is  that,  given  the  initial  description  of  the  system  m  a  s-conlext,  we 
carry  on  the  problem  solving  activity  in  PSC,  determining  a  (meta-level)  de¬ 
scription  of  the  plausible  states  of  the  system.  By  reflecting  down  results  in 
s-contexts,  we  get  simple  object-level  tlescriptions  of  such  slates.  Note  that  it  is 
not  true  that  all  the  retisoning  takes  place  in  PSC.  In  fact,  given  a  description  of 
the  system  in  a  s-context,  AS'C'is  only  responsible  for  determining  which  is  the  set 
of  descriptions  of  the  system  which  are  compatible  with  the  considered  one  (sup¬ 
posing  it  provides  only  a  partial  description  of  the  system),  and  for  determining 
the  state  transition.  PSC  cannot  reason  about  a  situation.  For  example,  it  is  not 
possible  in  PSC  to  deduce  that  fl  holds  in  C  supposing  we  know  that  «  and  a  0 
also  hold  in  C  (t.e.  TH("u"C'C"),  TAf  “a  — ,  “C” )  f- pscTHC0"  ,  On 

the  other  hand,  we  can  deduce  0  from  a  and  a  —  0  via  application  of  modus 
ponens  directly  in  the  C  context,  and  reflect  up  this  result  in  PSC. 

Such  correspondence  between  PSC  facts  and  .s-contexts  facts,  is  obtained 
defining  MS  such  that: 

-  the  set  of  natural  deduction  rules  as  described  in  [Pra65]  belong  to  the  set 
of  inference  rules  of  each  context  in  MS. 

-  PSC  has  names  for  each  s-c.ontext  and  for  any  formula  of  s-contexts.  More 
precisely,  we  write  “iv"  and  “C”  to  indicate  the  PSC  names  for  the  formula 
tv  and  the  context  C  respectively. 

-  in  there  is  a  distinguished  predicate  TH("w" ,  “c”)  which  holds  iff  the 

formula  w  belonging  to  the  language  of  context  c.  is  a  theorem  (i.f .  c)). 

This  is  obtained  by  defining  the  reflection  principles  [GS89]: 

C.)  „  {THC'w\'^CC),  PSC) 

{TH(“w”,‘'Cr),  PSC)  (w,C\)  ' 

'  Note  that  for  some  «,  j  6  /,  we  may  have  C\  =  C, 


551 


to  be  the  set  of  vVfiS  bridge  rules. 

So  far,  PSC  represents  a  correct  and  complete  rnetatheory  of  provability  in 
s-context.  What  we  want  to  do  is  to  ust  PSC  to  predict  the  behavior  of  the 
system  described  in  s-contexts  For  such  a  task,  it  is  necessary  to  define  a  a 
PSC  predicate  allowing  us  to  make  “non  monolonic"  extensions  to  the  situation 
described  in  a  s-coritext.  We  thus  introduce  a  predicate  C B B(vi ,c) ,  to  mean 
that  the  wff  w  Cnn^Be.Btlttved  wrt  the  problem  p  in  context  c  (notice  that  we 
are  assuming  to  have  a  many-sorte'*  language  as  GETFOL-language  is).  Notice 
that  we  get  the  set  of  facts  in  which  we  can  believe,  not  only  dependent  on  the 
situation  (i.e.  the  s-context),  but  also  on  the  problem  we  are  reasoning  about 
In  fact,  considering  a  situation,  we  may  have  different  beliefs  and/or  reasoning 
principles  (leading  to  different  set  of  facts  that  ran  be  believed)  depending  on  the 
problem  we  are  facing.  In  any  case,  the  wff  w  cannot  be  asserted  in  the  context 
c,  since  it  may  invalidate  other  beliefs  we  may  have  about  the  situation,  but 
need  to  be  asserted  in  a  “new”  s-context.  This  is  obtained  via  definition  of  a 
injective  function  t(w,  p,  c)  is  such  that,  applied  to  ground  terms, 

gives  a  constant  of  sort  context  (being  the  PSC  name  of  an  s-coiitext).  We  recall 
that  by  definition  of  injective  function,  nikcrt  satisfies  the  following  fact: 


7nkcjt{wi,pi,ci)  =  mFcj-<(ui2,/S),c-./)~*(it<i  =  w^^p]  =  pjAci  =  c;;) 

To  keep  tracks  of  the  state  tiansictions  in  a  compact  way,  we  introduce  a  pred¬ 
icate  Contexl-To-Coutexl  (CilC).  C2C{w ,p,c\,C't)  takes  into  account  the  fact 
that  context  c-i  has  been  introduced  (in  the  deduction)  while  solving  problem  p 
in  context  C] . 

We  are  now  able  write  the  PSC  axioms  stating  what  we  liave  informally 
described  above: 

PSCl  '■  \fwpc.(CBB(xii,p,c) — •  TH(w,inkcTt{xv,p,  c))) 

PSC‘2,  :  'ixupcic-i  (nikrxt{xi>,p.  Cl)  =  c-j  <— p,  C] ,  c-j)) 

Summarizing,  what  we  have  defiiieil  are  some  basic  principles  to: 

-  maintain  a  “correct”  relation  among  FS’C  and  s-contexts  (via  the  TH 
predicative  letter  and  the  reflection  principles); 

-  switch  s-context  when  we  want  to  consider  some  (new)  situation  (via  pred¬ 
icative  letters  CBB,  TH  aixii  axiom  P.S'C'l)  and  keep  track  of  the  context- 
transition  (via  predicative  letter  C2C'  and  axiom  F5'C2). 

What  it  remains  to  specify  are  the  problem  specific  criteria  by  which  a  formula 
can  be  believed  in  a  context  wrt  a  problem  (thus  causing  a  state  transition), 
and  which  facts  holding  in  the  context  we  were  reasoning  also  hold  in  the  newly 
considered  one. 


552 


3  Solving  the  17-tube  example 

In  this  section  we  show  how  it  is  possible  to  extend  the  basic  framework  de¬ 
scribed  in  the  previous  section  to  model  the  behavior  of  a  particular  dynamic 
system.  Two  solutions  are  proposed  for  each  addressed  problem:  a  simple-mirul 
one  and  a  slightly  more  elaborated  one  relying  on  some  principles  commonly 
used  in  Qualitative  Reasoning  [Bob84,  WDK89j.  To  give  a  flavour  of  the  GETFOL 
implementation,  from  here  on  v  .‘  use  GETFOL  syntax  and  notations  (using  the 
typewriter  font  to  evidentiate  GETFOL  code  and  symbols).  We  consider  a  sim¬ 
plified  version  of  the  ff-tube  example  fFor84,  KuiSti]  We  take  into  account  only 
two  quantities,  being  the  value  of  the  height  the  liquid  reaches  in  the  first  and 
.second  container.  We  call  such  ((iiantities  respectively  HI  and  H2,  and  define  them 
•iS  the  only  two  constants  in  each  s-context.  Three  different  "initial  states  '  are 
"pc'ssible” : 

HI  >  H2  HI  =  H2  H2  >  Hi 

Depending  on  the  initial  state  we  have  three  different  behaviors: 

1.  HI  >  H2,  in  such  a  case  there  will  be  a  fluid-ficrw  from  the  first  container  to 
the  second,  until  H1=H2  (“final  state”). 

2.  HI  =  H2,  in  such  a  case  notliing  happens  and  the  system  is  stable 

3.  HI  <  H2,  in  such  a  case  there  will  be  a  fluid-flow  from  the  second  container 
to  the  first,  until  H1=H2  (“final  state”). 

Such  facts  are  formally  stated  by  the  following  axiom 

DECLARE  INDCONST  Hi  H2; 

AXIOM  STRCTDES: 

(Hi  >  H2  imp  (INCREASE(H2)  and  DECREASE(H1 ) ) )  and 

(Hi  =  H2  imp  (CONSTANT(Hl)  and  C0(ISTAIIT(H2) ) )  and 

(H2  >  HI  imp  (INCREASE(Hl)  and  DECREASE(H2) ) ) ; 

formalizing  the  xlrnciujal  (/r  sm/itiorrof  the  (/-tube.  We  take  such  an  axiom  to  be 
part  of  the  set  of  axioms  in  each  .s-rontext  For  lack  of  sjiare,  we  omit  t  he  axioms 
about  “>”  (rtjldiviiy,  anUsymmtiry  ixwA  transttiviiy)  and  define  '  Ui  >  i-j”  as 
"(I’l  >  Uj)A-i(i'i  =  v->y'  Such  axioms  should  be  declared  in  GETFOL  From  a 
theoric  point  of  view,  we  consider  them  ;is  part  of  the  logical  axioms 

Note  that  in  each  ,s-context  we  have  a  )imit  number  of  variables  and  in¬ 
dividual  constants,  making  the  theory  as.sociated  to  each  .s-coiilext  ihnilublt 
[Kle52].  So,  It  makes  .sense  asking  wether  one  fact  is  consistent  with  the  theory 
corresponding  to  an  .s-context.  In  the  following,  we  m  ike  use  of  this  facility 
Considering  PSC.  in  P.S'C  we  have  sorts  WFF,  CONTEXT  ,  PROBLEM  and  VALUE 
For  each  sort,  we  suppose  to  have  deniimerately  many  variables  obtained  from 
the  first  letter  of  the  sort,  eventually  concatenated  with  a  number  Moreover 


553 


-  "Hi"  and  "H2"  are  defined  t.o  be  conslants  of  sort  VALUES  and  represent  the 
P5C-names  for  the  s-contexts  constants  HI  and  H2  respectively; 

-  function  symbols  mkmj  ,  mkeq  maps  pair  of  values  into  wffs.  If  "HI"  ,  "H2" 
are  the  P5C-names  for  HI,  H2  respectively,  mkmj  ("HI"  ,  "H2")  and 
mkeq("Hl"."H2")  are  the  P5C'namesfor  HI  >  H2andHl  =  H2  respectively; 

-  in  the  same  way,  we  have  function  symbols  mkconst ,  mkincr,  mkdecr  map¬ 
ping  a  value  into  a  formula,  mkconst  ("HI")  ,  mkincr  ("HI" ) ,  mkdecr  ("HI") 
represent  the  names  for  s-context  formulas  C0MSTANT(H1 ) ,  INCREASE(H1 ) , 
OECREASE(Hl)  respectively. 

We  want  to  show  how,  supposing  we  do  not  know  which  of  the  three  possible 
relations  between  Hi  and  H2  initially  holds,  it  is  possible 

-  to  create  the  three  contexts  corresponding  to  the  three  possible  initial  states. 
In  such  a  phase  we  say  that  we  .solve  the  "coinpUU”  problem  (since  we 
disambiguate  the  initial  only  [)artially  known  situation) 

-  to  determine  the  next  state  of  behavior,  starting  from  each  possible  initi.al 
situation.  In  this  i)hase,  we  say  that  we  are  solving  the  “pitdirt”  probleni 
(since  we  predict  the  next  state  of  behavior). 

Different  re.isoning  principles  are  to  be  used  de|)en<ling  on  the  problem  to  be 
solved.  To  adtlress  separetely  the  two  problems,  we  define  two  constants  (of  sort 
PROBLEM)  in  the  PiT' context: 

DECLARE  SORT  PROBLEM; 

DECLARE  IMDCONST  CMPLT  PRDCT  [PROBLEM] ; 

In  the  following  sections  we  omit  some  of  GETFOL  declarations  and  consider 
only  those  relevant  for  the  comprehension  of  the  example. 

3.1  Solving  the  complete  problem 

We  suppose  to  start  our  re;isoning  in  the  context  CO  in  which  the  only  fact 
we  know  is  the  structural  description  of  the  system.  The  simplest  solution  of 
the  complete  |)roblem  should  be  to  assume  in  P.S'C  that  each  relation  between 
HI  and  H2  cun.ht^ht  ht  vtd,  apply  axiom  P.S’f  'l  and  derive  (depending  on  such 
assumptions)  each  relation  in  three  separated  contexts  Cl,  C2  and  C3. 

Supposing  we  want  a  theory  primiiiij  amh  results,  then  we  have  to  consider 
the  following  axiom: 

AXIOM  CMPLTl:  forall  vl  v2  c. 

(CONSIST(mkmj (vl , v2) ,c)  imp  CBB(mkraj(vl ,v2) , CMPLT, c) )  and 
(COMSIST(mkeq(vl , v2) ,c)  imp  CBB(mkeq(vl ,v2) , CMPLT, c) )  and 
(CONSIST(mkmj ( v2 , vl ) ,c )  imp  CBB(mkmj (v2 , vl ) .CMPLT , c )) ; 


554 


It  simply  says  tliat  a  relation  between  to  values  may  hold  in  a  siliialion,  i!  it 
is  consistent  with  the  fact  we  know  about  such  a  situation.  Supposing  that  the 
theory  associated  to  each  s-context  is  deciilable,  (as  it  is  in  our  case),  we  may 
also  define  an  inference  with  no  premisess  and  conclusion  COllSIST(“w’‘ ,‘‘C" ) 
with  the  obvious  restriction  that  w  must  be  consistent  in  C. 

If  s-context  C'2  has  been  introduced  while  solving  the  complete  problem  in 
context  Cl ,  then  all  the  facts  which  are  true  about  the  state  corresponding  to  ci 
are  also  true  in  the  state  described  by  c-,: 

AXIOM  CMPLT2;  lorall  w  cl  c2.  (C2C(w,CMPLT.cl.c2)  imp 

forall  wl.  (TH(wl.cl)  imp  TH(tfl,c2))); 

In  our  case,  supposing  we  are  in  a  initial  situation  CO  in  which  we  do  not 
know  the  relation  holding  between  HI  and  H2,  we  can  determine  tiie  t  hree  jiosstble 
extensions  as  the  result  of  this  inferential  activity  in  PSC 

1.  Instantiate  axiom  CMPLTl  with  "HI"  ,''H2" /'CO'*.  Prove  the  hypotheses  of 

the  implication  aiul  prove: 

CBB (mkra j ( "HI" , "H2" ) , CMPLT , "CO" ) , 

CBB (mkeq( "H 1 " , "H2" ) . CMPLT . "CO" ) , 
CBB(mkmj("H2","Hl"),CMPLT,"C0") 

2.  Then,  via  axiom  PSCl  we  can  jirove: 

TH(mkmj("Hl","H2"),mkcxt(mkmj("Hl"."H2") , CMPLT , "CO" ) , 

TH(mkeq(”Hl","H2") .mkcxtfmkeqC’Hl'  , "H2" ), CMPLT, "CO") . 

TH(mkmj ("H2" , "HI") ,mkcxt(mkmj ("H2" , "HI" ) .CMPLT, "CO") 

3.  Supposing  we  get 

mkcxt(mkmj("Hl","H2"),CMPLT,"C0")=”Cl" 
mkcxt (mkeqC'Hl", "H2"), CMPLT, "CO" )="C2" 
mkcxt (mkmj ( "H2" , "HI") .CMPLT, "CO" )="C3" 

by  substitutions  of  equals,  we  have. 

C2C(mkmj("Hl","H2") .CMPLT, "CO" ."Cl")  THCmkmj ("HI" , "H2" )  ,"C1") 

C2C(nikeq("Hl","H2"),CMPLT,"C0","C2")  TH(mkeq("Hl" , "H2" )  ,"C2") 

C2C(mkmj("H2","Hl") ,CMPLT,"C0","C3")  THfmkmj ("H2"  .  "HI" )  ,"C3") 

and,  by  means  of  the  ufltctton  duum  principle,  assert  that  HI  is  >,=,<  of 

H2  in  Cl,  C2,  C3  respectively 

Note  that  in  this  rase  it  is  useless  to  apply  axiom  CMPLT2  since  nil  the  theo¬ 
rems  ill  CO  (i.f.  {«■  :  (le,  CO)})  also  hold  in  Cl,  C2  and  C3. 


555 


3.2  Solving  the  predict  problem 

Considering  tlie  situation  nioileled  in  context  Cl  in  which  HI  >  H2,  anil  sup¬ 
posing  we  want  to  predict  the  next  relation  plausibly  holding  between  the  two 
values,  we  could  assume  CBB("H1=H2",PRDCT,  "Cl"),  and  assert  (via  PSCl  and 
Rdown)  Hl=H2  in  a  context  C4. 

A  more  refined  solution,  which  obeys  the  Itmtt  analysts  as  usually  carried 
out  in  Qualitative  Reiisoning  [DKB84,  For84,  Kui86])  is  also  not  difficult  to  be 
provided.  In  fact,  it  is  sufficient  that  one  value  vi  is  greater  of  another  v-)  and 
that  the  former  is  increasing  or  the  latter  decreasing,  for  believing  that  there  is 
a  possible  situation  in  which  ci  =  vi  holds  [KuiSb]: 

AXIOM  PRDCTl :forall  vl  v2  c. 

( (THCmkmj (vl , v2) , c)  and  (TH(mkincr(v2) , c)  or  TH(mkdecr(vl ) ,c) ) )  imp 

CBB(mkeq(vl ,v2) .PRDCT.c) ) ; 

In  the  same  way,  if  two  values  are  equal,  it  is  sufficient  that  one  value  is  changing 
for  believing  that  they  are  going  to  be  one  greater  than  the  other: 

AXIOM  PRDCT5:forall  vl  v2  c. 

((TH(mkeq(vI,v2) ,c)  and  (TH(rakincr( vl),c)  or  TH(mkdecr(v2) , c)))  imp 

CBB(mkmj(vl.v2) .PRDCT.c)) ; 

Of  course  the  axioms  are  .so  simple  since  we  consider  a  very  .sim|)le  quantity 
space  (-,0,-1-)  for  derivative  values.  However  the  framework  is  easily  extendible 
to  treat  more  complex  quantity  spaces. 

Considering  the  f/-tube  example,  by  the  facts  we  have  derived  solving  the 
“comidete”  jiroblem  and  from  the  general  de.scription,  we  have: 

in  Cl  :  H1>H2,  I((CREASE(H2)  ,  DECREASE(Hl) 
in  C2  :  H1=H2,  STEADY 

ill  C3  :H2>H1,  IKCREASE(H1 ) .  DECREASE(H2) 

For  example,  let  consider  how  the  state  in  Cl  evolves: 

1.  ProveTH(mkmj("Hl" ,"H2") ,"C1")  and  THfmkincr ("HI") , "Cl" )  via  reflec¬ 
tion  U|). 

2.  Instantiate  axiom  PRDCTl  with  "HI"  ,  "H2"  ,"C1"  and  jirove; 

CBB(mkeq("Kl","H2") ,PRDCT."C1"); 

3.  supjiose  we  get  mkcxt  (mkeq( "HI"  ,  "H2" )  , PRDCT,  "Cl" )  ="C4".  we  have 


C2C(mkeq("Hl","H2") ,PRDCT,"C1","C4")  TH(mkeq("Hl" ,"H2") ,"C4") 

4.  By  means  of  the  rcfltciton  down  principle  assert  H1=H2  in  C4. 


556 


4  Final  considerations 

What  have  we  achieved? 

From  a  representational  point  of  view,  we  have  a  simple  model  of  each  state 
and  in  each  s-context  we  have  only  tliose  facts  which  are  “relevant”  for  the 
state  being  modeled  (with  clear  computational  advantages).  We  have  a  clear 
distinction  between  the  (FOL)  model  of  the  states  and  the  inetatheoretic  princi¬ 
ples  allowing  ns  to  derive  the  evolution.  In  PSC  we  have  a  complete  knowledge 
of  the  whole  tree  of  states;  this  enables  us  to  consider  and  apply  whatever  con¬ 
straint  we  want  about  the  evolution.  For  example,  supposing  we  want  to  consider 
(metatheoretic)  principles  enabling  us  to  cut  oti  “useless”  or  "mii)Ossil)le  '  states 
(e.g.  determined  via  some  principles  like  those  in  [KC87]),  we  can  simply  sup¬ 
pose  to  mark  such  states  (that  is  s-contexts)  as  “no  good”  (e  g.  by  having  a 
P5C  predicate  NOGOOD(c)).  Such  s-c.onlexts  are  not  to  be  taken  into  account 
while  generating  the  evolution  tree,  and  this  can  be  obtained  simply  putting  in 
the  hypotheses  of  each  axiom  of  PSC,  the  condition  that  we  are  speaking  about 
“good”  context. 

From  a  formal  point  of  view,  inside  each  context  we  have  all  the  nice  proper¬ 
ties  FOL  has  (eg.  clear  semantics  and  proof  theory).  Solving  the  preilicl  problem, 
we  avoid  contradiction  by  isobating  each  state  in  separated  s-contexts;  in  [GW88] 
this  kind  of  non-rnonotoiiicity  has  been  defined  non  tssentuil  Solving  the  com¬ 
plete  problem,  we  make  “consistency-based”  extension  to  the  theory.  Key  point 
in  our  framework  is  axiom  PSC\  which  allows  (together  with  the  refltcUon  prin¬ 
ciples)  to  safely  define  the  s-contexts  corresponding  to  non  monotonic  extension 
of  PSC. 

We  also  provide  an  implementation  within  the  GETFOL  system  [GTlf  1].  GETFOL 
is  an  interactive  system  based  on  Natural  Deduction  [Pra65]  extending  the  FOL 
system  developed  at  Stanforrl  by  Richard  Weyhrauch  [Wey80].  GETFOL  turns  out 
to  be  appropriate  for  our  purposes  since,  among  othe  features  (e.g.  the  many 
sorted  language),  allows  the  user  to  define  and  handle  midtiple  contexts. 

5  Acknowledgements 

We  are  indebted  to  Fausto  Giunchiglia  for  the  many  invaluable  GETFOLfcFOL 
suggestions  we  had.  The  Mechanized  Reasoning  Group  in  Trento  and  Genova,  the 
Formal  Reasoning  Group  in  StanfonI,  Marco  Gadoli,  John  McCarthy,  Carolyn 
Talcott  and  Richard  Weyhrauch  are  also  thanked  for  having  helped  the  second 
author  in  various  ways.  This  work  hiis  been  supported  by  the  Italian  National 


] 


557 

Research  Council  (CNR),  Progetto  Finalizzato  Sistenii  Inlbrinatici  e  Calcolo 

Parallelo  (Special  Project  on  Information  Systems  and  Parallel  Computing). 

References 

[Bob84]  D.G.  Bohrow.  Qualitative  reasoning  about  physical  systems  Artificial  In¬ 
telligence.,  24,  1984. 

[DKB84]  J.H.  be  Kleer  and  J.S.  Brown.  A  qualitative  physics  based  on  confluences 
Artificial  Intelligence.,  24:7-83.  1984. 

[For84]  K.D.  Forbus.  Qualitative  process  theory  Aitificial  IntAligence., 

1984 

[Giu9l]  F.  Giunchiglia  Multdanguage  systems,  hi  Froceediny.'i  of  AAAI  Spring  Stjtn- 
posinm  on  Logical  Forinulizationn  of  Coinnionsense  Reasoning,  1991.  Also 
IRST-Technical  Report  no.  9011-17. 

(GS89]  F.  Giunchiglia  and  A.  Smaill.  Reflection  in  constructive  and  non¬ 
constructive  automated  reasoning.  In  H.  Abramson  and  M.  H  Rogers,  ed¬ 
itors,  Proc.  Workshop  on  Meta- Programming  in  Logic  Programming,  pages 
12'1-14.5.  MIT  Press,  1989,  Also  IRST-Technical  Report  8902-04  and  DAI 
F'  jarch  Paper  37.5,  University  of  Edinburgh. 

[GT91]  F.  Giunchiglia  and  P.  Traverso.  GETFOL  U.ser  Manual  -  GETFOL  version  1. 

Manual  9109-09,  IRST,  Trento,  Italy,  1991.  Also  MRCi’-DI.‘'T  Technic, it  Re¬ 
port  9107-01,  DIST,  University  of  Genova. 

[GW88]  F  Giunchiglia  and  R.W.  Weyhrauch.  A  multi  -context  monotonic  axioma- 
tization  of  inessential  non-monotonicity.  In  D  Nardi  and  P.  Maes,  editors, 

Meta-level  architectures  and  Reflection,  pages  271-28.5  North  Hollan.l,  1988. 

Also  MRG-DIST  Technical  Report  910.5-02,  DIST,  University  of  (.ienova, 

Italy. 

[KC87]  B.J,  K\iipers  and  C.  Chiu.  Ta  ning  intractable  branching  in  qualitative  sim¬ 
ulation.  In  Proc.  IJCAI  conference,  pages  1079-1085  International  .loint 
Conference  on  Artificial  Intelligence,  1987. 

[Kle52]  S.C.  Kleene.  Introduction  to  Metamatliematics.  North  Holland.  1952. 

[Kui86]  B.J.  Kuipers.  Qualitative  simulation.  Artificial  Intelligence,  29:289-338, 

1986. 

[McC90]  .1.  McCarthy,  Generality  in  Artificial  Intelligence.  In  .1  McC.arthy,  editor, 

Formalizing  Common  Sense  -  Papers  by  John  Mct.'arthy,  pages  226-236 

Ablex  Publishing  Corporation,  1990.  j 

[McC9l]  ,1.  McCarthy.  Notes  on  formalizing  context.  Unpublished,  1991.  ) 

[Pra65]  D.  Prawitz.  Natural  Deduction  -  A  proof  theoretical  study.  Alnu|uist  and  i 

Wiksell,  Stockholm,  1965. 

[Sho9l]  Y.  Slioham.  Varietes  of  context.  In  V.  Lifscliitz,  editor.  Artificial  Intelli¬ 
gence  and  Mathematical  Theory  of  Computation  -  Papers  in  honor  of  John 
McCarthy,  pages  393-408.  Academic  Press,  1991. 

[WDK89]  D.S.  Weld  and  J.H.  De  Kleer.  Readings  in  Qualitative  Reasoning  about 
Physical  Systems.  Morgan  Kautmann  Publishers,  Inc  ,  95,  First  Street,  Los 
Altos,  CA  94022,  1989. 

[Wey80]  R  VV  Weyhrauch.  Prolegomena  to  a  Theory  of  Mechanized  Formal  Reason¬ 
ing.  Artif.  Intel'..,  13{  1 ):  133- 1 76,  1980. 


J 


Systematic  assessment  of  temporal  reasoning 
methods  for  use  in  autonomous  agents 


Erik  Sandewall 

Department  of  Computer  and  Information  Science 
Linkoping  University 
Linkoping,  Sweden 
E-mail  ejs#ida.liu. se 


Abstract.  Non-monotonic  logics  for  reasoning  about  time  and  action 
have  traditionally  been  verified  through  their  informal  plausibility  and 
their  application  to  a  small  number  of  test  examples.  I  present  a  system¬ 
atic  method  using  an  underlying  semantics,  where  a  preferential  non¬ 
monotonic  logic  is  correct  if  the  maximally  preferred  models  equals  the 
set  of  histories  that  can  arise  in  a  "scenario  simulation”  of  the  underly¬ 
ing  semantics.  Several  old  and  new  logics  have  been  analyzed  using  the 
systematic  method. 

The  systematic  method  has  first  been  applied  to  the  relatively  straight¬ 
forward  case  of  scenarios  with  simple  inertia,  where  ion-deterministic 
actions  and  actions  with  duration  different  from  one  timestep  are  the 
only  complications.  In  this  extended  abstract  I  first  make  a  brief  review 
of  the  approach  to  simple  inertia,  and  then  describe  how  the  systematic 
approach  can  be  extended  to  obtain  a  precise  account  of  broader  types  of 
frame  problems,  such  as  problems  involving  surprises  and  ramifications. 


1  The  systematic  approach 

There  hcis  been  much  research  in  recent  years  on  methods  for  reasoning  about 
actions  and  change,  and  on  finding  solutions  for  the  so-called  ‘  frame  problems”. 
The  non-monotonic  logics  which  have  been  proposed  for  this  purpose  have  tra¬ 
ditionally  been  verified  through  their  informal  plausibility  and  their  application 
to  a  small  number  of  test  examples.  The  limitations  of  this  approach  have  be¬ 
come  more  apparent  in  recent  years,  and  at  the  same  time  some  results  of  more 
systematic  nature  have  been  presented  in  particular  by  Lin  and  Shoham[LS91], 
Lifsch'tz[Lif91],  and  Reiter[Rei91],  all  of  which  have  used  a  situation-calculus 
framevork.  In  this  extended  abstract  I  shall  describe  another  approach  which 
is  based  on  an  explicit  treatment  of  time  in  a  linear  ime  domain,  and  which  is 
able  to  accomodate  a  broader  range  of  reasoning  problems.  The  complete  results 
are  available  in  a  review  version  of  a  forthcoming  book[San92]. 

By  an  assessment  of  a  nonmonotonic  logic,  I  mean  a  statement  that  for 
a  certain  clciss  of  reasoning  problems,  the  logic  is  guaranteed  to  always  give 
the  correct  set  of  conclusions:  the  set  of  intended  conclusions  is  equal  to  the 


559 


set  of  actual  conclusions  obtained  through  the  logic.  A  systematic  approach  to 
nonmonotonic  logic  is  one  where  it  possible  to  obtain  such  assessments  with 
formal  proofs,  bsised  on  precise  definitions.  A  systematic  approach  requires  in 
particular  that  one  makes  precise  definitions  of  the  class  of  reasoning  problems 
in  question,  and  of  the  set  of  intended  conclusions. 

I  have  developed  such  a  systematic  approach  in  model  theoretic  terms,  where 
therefore  the  correctness  criterion  is  that  the  set  of  intended  models  shall  equal 
the  set  of  actually  obtained  models  in  the  logic.  A  central  part  of  the  approach  is 
to  use  an  ontological  taxonomy,  which  provides  the  instrument  for  characterizing 
classes  of  systems  or  “worlds”  which  the  logic  is  used  for  reasoning  about.  For  a 
given  logic,  the  assessment  will  specify  a  class  within  that  taxonomy  for  which 
the  logic  always  obtains  the  correct  results.  This  makes  it  easy  to  compare  the 
range  of  applicability  of  different  logics. 

The  nonmonotonic  logics  that  are  to  be  assessed  and  compared  in  this  fashion, 
can  often  be  characterized  in  terms  of  a  preference  relation  or  a  selection  function 
on  models.  A  selection  function  maps  any  set  5  of  interpretations  to  a  subset 
of  S,  and  is  used  to  map  from  the  set  of  classical  models  for  the  given  premises 
to  a  smaller  set  of  selected  models.  In  one  important  special  case,  the  selection 
function  obtains  those  members  of  S  which  are  minimal  according  to  a  certain 
preference  relation,  so  the  maximally  preferred  models  are  the  selected  on?s 
More  complex  selection  functions  are  needed  to  account  for  broader  classes  of 
reasoning  problems.  In  particular  one  may  need  to  divide  the  premises  into 
several  partitions,  and  to  apply  different  selection  functions  to  the  classical  model 
sets  for  the  different  partitions. 

The  main  steps  in  the  development  of  the  approach  are  as  follows. 

1.  Semiformal  definition  of  an  overview  taxonomy,  using  a  set  of  specialities  i.e. 
features  of  reasoning  problems.  The  following  are  some  of  the  ontological 
specialities,  i.e.  those  which  characterize  the  world  that  is  being  reasoned 
about; 

C:  Concurrent  actions  are  allowed. 

L:  Delayed  effects  are  allowed,  i.e.  effects  of  an  action  that  take  place  after 
the  action  hais  been  completed. 

T:  Timebound  objects  are  allowed,  i.e.  objects  that  are  created  and  termi¬ 
nated  at  some  points  in  time. 

The  overview  taxonomy  also  specifies  the  epistemological  assumptions,  i.e. 
what  assumptions  are  made  about  the  completeness  of  the  information  in 
the  given  premises.  Here  I  shall  only  consider  the  case  that  all  actions  are 
known,  and  that  all  potential  effects  of  actions  are  also  known  although 
aondeterministic  actions  are  allowed.  This  case  will  be  represented  as  /C. 
The  simplest  category  in  the  overview  taxonomy  is  /C— lA  representing  the 
1C  assumption,  integer  time,  sequential  actions,  an  inertia  assumption  for  all 
fluents  i.e.  the  value  of  a  fluent  does  not  change  except  directly  due  to  an 
action,  but  non-deterministic  actions  and  actions  with  a  duration  in  time 
(not  just  a  single  time-step)  are  allowed.  Broader  categories  are  formed  e.g. 
as  /C— lAC  if  the  requirement  for  sequential  actions  is  dropped,  or  AC— lAL 


560 


if  delayed  effects  are  allowed. 

2.  For  each  category  of  reasoning  problems  in  the  overview  taxonomy,  one  de¬ 
fines  a  corresponding  underlying  semantics  in  a  formal  and  precise  fashion. 
I  use  an  underlying  semantics  which  captures  the  basic  A. I.  intuitions,  sim¬ 
ilar  to  the  “agent  model”  of  Genesereth  and  Nilsson  [GN87].  The  primary 
purpose  of  the  underlying  semantics  is  to  define  what  is  the  set  of  intended 
models  for  a  given  set  of  premises.  Indirectly,  the  underlying  semantics  also 
serves  as  a  precise  definition  of  the  problem  family  at  hand.  Also  it  is  the 
basis  for  the  subsequent  steps  of  analysis. 

3.  Next,  a  more  fine-grained  taxonomy  is  defined  within  the  overview  category 
at  hand,  by  introducing  subordinate  specialities.  For  example,  within  the 
/C— lA  category  we  define  the  more  restricted  classes  IsA  where  all  ac¬ 
tions  must  be  performed  over  a  single  time-step,  AC— lAd  where  all  actions 
must  be  deterministic  (the  state  of  the  world  when  the  action  starts  deter¬ 
mines  completely  the  state  of  the  world  when  the  action  ends),  and  a  number 
of  other  classes.  These  sub-specialities  can  also  be  freely  combined,  e.g.  as 
X:-IsAd. 

4.  Naturally  there  also  needs  to  be  a  certain  number  of  syntactic  definitions  in 
order  to  specify  how  formulae  are  to  be  written  and  organized.  Besides  the 
usual  kind  of  syntax  definitions  that  are  needed  in  any  logic,  some  of  the  non¬ 
monotonic  logics  require  that  one  separates  the  given  premises  into  several 
distinct  sets  called  partitions  -  for  example  one  set  of  “action  statements” 
and  another  set  of  “observations”  -  and  there  may  by  special  constraints  on 
the  synteix  of  formulae  in  each  of  those  premise  partitions.  The  syntax  in 
this  extended  sense  must  be  related  to  the  underlying  semantics  as  well  as 
to  the  conventional  (Tarskian)  semantics  for  the  logic. 

5.  At  this  point  all  the  conceptual  tools  are  in  order,  and  the  assessment  can  be 
performed.  The  result  of  the  assessment  must  be  a  statement  that  for  a  given 
class  of  reasoning  problems  which  is  defined  in  the  fine-grained  taxonomy  and 
for  a  given  non-monotonic  logic,  the  set  of  intended  models  as  defined  by 
the  underlying  semantics  equals  the  set  of  selected  models  as  obtained  by 
the  logic.  For  example  if  the  selection  function  in  the  non-monotonic  logic 
is  based  on  a  preference  relation  -C,  then  it  is  the  set  of  <C-minimal  models 
that  must  equal  the  set  of  intended  models.  Naturally  a  formal  proof  of  the 
assessment  statement  is  aiso  required. 

This  methodology  de-dramatizes  the  issue  of  correctness  for  an  applied  non¬ 
monotonic  logic.  If  the  range  of  applicability  for  logic  A  is  a  subset  of  the  range 
for  the  logic  B,  then  there  may  still  be  good  reasons  for  using  logic  A,  namely  if 
one’s  application  satisfies  the  restrictions  that  A  imposes  and  if  A  can  be  imple¬ 
mented  more  efficiently  than  B.  This  is  particularly  important  since  logics  (or 
more  precisely,  non-monotonic  entailment  criteria)  that  suffice  for  a  broad  range 
of  common-sense  reasoning  problems  tend  to  become  quite  complicated. 

This  systematic  methodology  was  first  applied  to  the  case  of  simple  inertia, 
where  non-deterministic  actions  and  actions  with  duration  different  from  one 
timestep  are  the  only  complications.  The  results  for  simple  inertia  have  been 


presented  in  the  book  manuscript[San92]  and  in  a  separate  paper  [San93].  In 
this  extended  abstract  I  will  describe  how  the  results  for  simple  inertia  can  be 
extended  to  obtain  a  precise  account  of  broader  types  of  frame  problems,  such 
as  problems  involving  surprises  and  ramifications.  However  1  must  begin  with  a 
brief  review  of  the  approach  to  simple  inertia  since  the  new  results  build  on  it. 

In  order  to  keep  this  paper  short  and  to  focus  on  the  general  approach  and  the 
main  results,  I  omit  a  number  of  details  and  fine  points  in  the  formal  definitions. 
PlcEise  refer  to  the  book  manuscript  if  a  clarification  is  needed. 

2  Simple  inertia 

It  is  natural  to  first  address  the  simple  category  of  AC— lA,  and  to  analyze  it 
in  detail.  This  family  of  reasoning  problems  is  broader  than  the  ones  which 
have  been  considered  by  Shoham,  Lifschitz,  and  Reiter  in  particular  by  allowing 
the  combination  of  actions  with  extended  duration  in  time  and  non-deterministic 
actions  such  as  e.g.  flipping  a  coin.  For  non-deterministic  actions  the  assumption 
of  complete  knowledge  is  that  there  are  several  possible  outcomes  of  the  action, 
even  for  a  given  state  of  the  world  when  the  actions  starts,  but  the  set  of  possible 
outcomes  is  known.  More  generally,  to  also  account  for  actions  with  extended 
duration,  the  set  of  possible  trajectories  is  assumed  as  known.  Each  trajectory 
specifies  the  duration  of  the  action,  the  set  of  fluents  which  are  changed  in 
the  course  of  the  action,  and  the  value  of  each  of  those  fluents  in  each  time- 
step  within  the  trajectory.  In  other  words  each  trajectory  is  a  finite  sequence  of 
partial  states  of  the  world.  Corresponding  to  the  frame  assumption  of  inertia,  all 
fluents  which  are  not  included  in  the  trajectory  are  assumed  to  remain  constant 
for  the  duration  of  the  action. 

The  underlying  semantics  for  AC— lA  captures  these  concepts,  and  is  defined 
as  follows.  If  r  is  a  state  of  the  world  and  A  is  an  action,  then  Inf  1(  A,  r)  is  a  set  of 
features  (names  of  fluents),  and  Trajs(A,  r)  is  a  set  of  trajectories  each  of  which 
is  a  finite  sequence  of  partial  states  where  exactly  the  features  in  Infl(A,r) 
are  assigned  values.  This  means  that  the  set  of  influenced  fluents  must  be  kept 
constant  throughout  a  trajectory,  and  must  be  the  same  for  all  trajectories  for 
a  given  A  and  r. 

A  reasoning  problem  is  assumed  to  be  stated  in  terms  of  (1)  a  set  of  action 
laws  characterizing  the  effects  of  actions,  (2)  a  set  of  action  statements  specify¬ 
ing  which  actions  are  performed  and  restrictions  on  their  timing,  and  (3)  a  set 
of  observations  specifying  the  values  of  fluents  at  one  or  more  points  in  time. 
For  a  given  set  of  action  laws,  one  can  construct  the  corresponding  functions 
Infl  and  Trajs  with  the  structure  which  has  just  been  defined,  and  consider 
them  as  the  operative  definition  of  the  “world”.  The  full  definitions  of  the  un¬ 
derlying  semantics  are  such  that  the  actions  laws  will  uniquely  define  the  world 
(Infl,Traj  s). 

The  set  of  intended  models  for  a  given  set  of  premises  is  then  defined  as 
follows.  Consider  a  non-deterministic  game  between  an  “ego”  and  the  world, 
where  the  two  players  take  turns  to  perform  moves,  and  a  history  in  the  world 


562 


is  constructed  as  the  game  proceeds.  The  ego  is  essentially  Newell’s  “knowledge 
level”,  and  the  “world”  is  understood  as  the  combination  of  the  agent’s  physical 
environment  and  its  own  physical  body  or  vehicle. 

At  the  start  of  the  game  time  has  the  value  zero,  and  the  world  is  in  an 
arbitrary  state.  At  each  step  in  the  game,  one  has  constructed  a  history  over  a 
finite  interval  of  time  [0,n]  where  n  represents  “now”.  Each  move  by  the  ego 
consists  of  issuing  an  internal  command  to  perform  an  action  A  at  time  n.  The 
only  effect  of  that  invocation  in  the  history  is  to  add  the  information  that  action 
A  started  at  time  n  and  is  still  in  process. 

When  the  world  has  the  move,  it  implements  the  action  that  the  ego  initiated, 
as  follows.  Let  r  be  the  state  of  the  world  at  time  n  i.e.  the  present  ending  state 
of  the  history,  and  suppose  the  ego  has  invoked  action  A  starting  at  time  n.  'I'he 
world  selects  an  arbitrary  member  of  Trajs(A,r).  Suppose  that  member  has 
length  k.  The  world  will  then  extend  the  present  history  to  length  [0,  n  +  <r]  by 
adding  k  new  states  which  are  obtained  from  the  chosen  trajectory  by  allowing  all 
other  features  (except  those  included  in  Infl(A,r))  to  remain  constant.  Then 
it  is  the  ego’s  turn  to  move  again.  When  the  ego  has  run  out  of  moves,  the 
constructed  finite  history  is  extrapolated  to  infinity  without  any  changes  of  state. 

For  any  given  reasoning  problem,  one  uses  its  action  laws  to  construct  (Inf  1, 
Trajs),  and  then  constructs  the  set  of  all  histories  (games)  for  all  possible  initial 
states,  all  possible  ego  behaviors  and  all  possible  choices  by  the  world  when  it 
selects  a  member  of  Trajs(A, r).  Among  those  histories,  one  selects  those  which 
(1)  satisfy  all  the  action  statements,  (2)  satisfy  all  the  observations,  and  (3)  do 
not  contain  any  actions  besides  those  included  among  the  action  statements.  The 
resulting  set  of  histories  is  by  definition  the  intended  ones,  and  sets  the  standard 
for  what  models  the  non-monotonic  logic  is  supposed  to  prefer  or  select. 

With  these  definitions  it  has  been  possible  to  assess  the  range  of  applicabil¬ 
ity  of  a  number  of  previously  proposed  entailment  methods,  such  as  different 
variants  of  chronological  minimization,  uses  of  filtering  and  occlusion,  causal 
minimization,  explanation  closure,  etc.  These  assessments  have  been  presented 
in  [San92,  San93]. 

Notice  that  under  this  definition  of  intended  models,  it  is  quite  possible  to 
have  inconsistent  reasoning  problems,  where  the  set  of  intended  models  is  empty. 
For  example  in  the  stolen  car  scenario,  the  eissumptions  are  that  if  the  car  is  left 
overnight  in  the  garage  then  it  is  still  there  in  the  morning,  and  that  the  car  is  left 
in  the  garage  over  two  successive  nights.  Anyway  after  two  nights  it  is  gone.  This 
is  an  inconsistent  reasoning  problem  in  /C— lA  i.e.  under  the  underlying  semantics 
that  has  just  been  defined.  Any  non-monotonic  logic  that  obtains  a  non-empty 
set  of  preferred  models  for  this  example  is  therefore  incorrect  for  fC—IA.  If  one 
intends  that  the  set  of  models  in  this  example  shall  be  non-empty,  then  one  needs 
to  use  another  and  broader  category  in  the  taxonomy  of  reasoning  problems.  In 
particular  the  speciality  S  for  “surprises”  allows  for  additional  changes  of  fluent 
values,  besides  those  specified  by  (Inll,  Trajs),  which  usually  do  not  occur  but 
which  may  need  to  be  assumed  in  order  to  avoid  obtaining  an  empty  model  set. 

The  formal  analysis  which  leads  up  to  assessment  results  for  /C— lA  is  based 


563 


on  the  use  of  a  full  trajectory  normal  form,  FTNF,  for  action  laws.  Consider  for 
example  a  shooting  world  with  three  propositional  fluents,  namely  a  (“the  turkey 
is  alive”),  /  (“the  gun  is  loaded”),  and  r  (“it  is  raining”).  Also  there  is  one  action 
Fire  which  has  the  effect  of  unloading  the  gun,  if  it  was  loaded,  and  of  killing 
the  turkey  iff  the  gun  was  loaded.  The  rain  does  not  affect  these  outcomes,  and 
the  rain  itself  is  not  affected.  Furthermore  for  the  sake  of  the  example,  assume 
that  in  the  case  where  the  turkey  is  alive  and  the  gun  is  loaded  the  action  takes 
two  time  units,  where  the  gun  becomes  unloaded  during  the  first  time-step,  and 
the  turkey  dies  in  the  second  time-step.  In  all  other  cases  it  takes  one  time-step. 

Under  these  assumptions  the  set  Trajs(Fire,  r)  will  have  exactly  one  mem¬ 
ber  for  each  choice  of  r.  Table  1  specifies,  for  each  choice  of  r,  the  value  of 
Inll(Fire,r)  and  the  single  member  of  Trajs(Fire,  r).  The  FTNF  of  the  ac- 


Ini  1(  Fire,  r) 

Trajs(Fire,  r) 

({a:T,l:F},{a;F,/:F}) 

{a,l} 

l({a:T,/:F},{a:F,/:F}) 

rnuniriKii 

0 

{0} 

rniwnffHEiiM 

0 

(0) 

(a  :  F,  /  ;  T,  r  :  T} 

{1} 

{o  ;  F,1  :  T,r  :  F} 

{1} 

{a:F,/:F,r:T} 

0 

(0) 

0 

(0) 

Table  1.  The  world  description  (Infl.Trajs) 


tion  law  for  Fire  is 
[s,  <]  Load  ^ 

(( Cs]  a  A  /  A  r)  =►  ( [s,  <]  a  :=  F  A  /  :=  F)  A  <  =  s  -H  2  A  [s  -I-  1]  o  A  -i/)  A 

(([s]a  A  /  A  -t)  =>•  (  [s,  <]a  :=  F  a  /  :=  F)  a  f  =  s  -t-  2  a  [s  -f  l]a  a  ->/)  A 

(([s]aA-i/Ar)=><  =  s-t-l)A 

(( [s]  a  A  -<l  A  -<r)  =>  t  =  s  +  1)  A 

((  [s]  ->0  A  /  A  r)  =>  (  [s,  <]  /  :=  F)  A  <  =  s  -f  1)  A 

(([s]~'a  A  /  A  -ir)  =>([«,<]/;=  F)  A<  =  s  1)  A 

((  [s]  -'O  A  -■/  A  r)  <  =  S  -)-  1)  A 

(( [s]  -la  A  -■/  A  -ir)  =>  f  =  s  -f-  1 ) 

The  construct  [s,f]/  :=  x  designates  that  the  fluent  /  changes  its  value  some 
time(s)  during  the  time  interval  [s,f]  and  finally  obtains  the  value  x.  The  op¬ 
erator  ^  may  be  read  as  implication  for  the  present  purpose. 

It  is  seen  that  this  formula  is  a  direct  translation  of  the  information  in  the 
table  into  logical  formulae.  The  similarity  is  an  advantage  in  the  formal  proofs, 
but  for  practical  purposes  one  would  of  course  use  a  simpler  formula  which  is 
logically  equivalent  to  the  full  trajectory  normal  form,  such  as 
[s,  <]  Load  ^ 


M 


564 


(( [s]  a  A  1)  =>  ([s,/]a  :=  F)A<  =  s  +  2A  [«+  l]aA-i/)  A 
((  [s]  -<a  V  ->/)  =>t  =  s+  l)A 
Cs,<]/  :=  F 

Since  most  of  the  entailment  methods  are  defined  in  terms  of  model  sets,  the 
transformation  to  a  classically  equivalent  formula  does  not  influence  their  sets 
of  selected  or  preferred  models.  However  the  formal  proofs  of  assessments  for 
various  entailment  methods  rely  on  the  direct  correspondence  between  the  FTNF 
syntax  and  the  underlying  semantics. 


3  The  frame  problems 

3.1  Towards  common-sense  formulations  of  action  laws 

The  primary  formulation  of  the  action  laws  which  was  used  in  the  analysis  for 
K—IA  -  the  full  trajectory  normal  form  -  reflects  the  structure  of  the  underly¬ 
ing  semantics.  Another  possible  choice  of  primary  formulation  would  be  to  use 
the  “most  natural”  expression  of  common-sense  facts  in  logic.  As  the  example 
showed,  there  is  a  fairly  big  difference  between  these  two  possible  choices  of  pri¬ 
mary  formulations  of  action  laws  due  to  the  precise  structure  of  the  underlying 
semantics. 

Let  me  refer  to  these  choices  as  the  semantics-motivated  and  the  common- 
sense-motivated  forms  for  expressing  action  laws.  From  an  A.l.  point  of  view  it  is 
the  common-sense-motivated  form(s)  that  one  really  wants  to  capture.  However 
that  form  is  notoriously  difficult  to  pinpoint,  and  the  semantics-motivated  form 
hcis  the  advantage  of  an  exact  definition  which  makes  a  precise  analysis  possible. 
My  approach  is  therefore  to  start  with  the  semantics-motivated  form  of  the  action 
laws,  and  to  introduce  and  analyze  various  transformations  and  extensions  of 
the  semantics-motivated  form  which  may  bring  it  closer  to  the  common-serise- 
motivated  forms. 

This  approach  differs  from  the  standard  approach  in  the  A.l.  study  of  common- 
sense  recisoning,  which  has  been  to  start  from  the  common-sense-motivated  form 
of  the  action  law,  and  to  inquire  how  the  inference  machinery  of  the  logic  would 
have  to  be  chosen  in  order  to  obtain  the  intended  conclusions  from  those  for- 
mulcie.  The  problem  with  the  standard  approach  is  that  it  does  not  provide  a 
reliable  basis  for  the  continued  formal  analysis,  and  useful  general  results  have 
therefore  not  been  obtained.  Therefore  1  start  instead  from  the  underlying  tra¬ 
jectory  semantics,  and  ask  how  the  logic  (syntax  and  entailment  conditions)  can 
be  modified  in  the  direction  of  more  natural  or  common-sense-like  formulations 
of  the  action  laws  and  of  the  other  components  of  the  chronicle. 

A  small  step  in  this  direction  wais  already  taken  above  with  the  observation 
that  one  can  replace  each  action  law  in  FTNF  by  a  simplified  formula  which  is 
more  compact  but  logically  equivalent  (in  the  sense  of  classical  logic,  i.e.  having 
the  same  model  set).  However  such  local  rewriting  will  only  obtain  a  moderate 
improvement  of  the  naturalness  in  the  action  laws. 


565 


A  major  desideratum  for  more  natural  action  laws  is  that  one  shoiild  be  able 
to  “factor  out”  some  information  from  the  laws  i.e.  to  make  separate  statements 
of  some  of  the  relevant  information.  The  action  law  can  then  be  restricted  to 
expressing  the  “typical”  or  “primary”  effects  of  the  action,  whereas  statements 
about  “unusual”  or  “indirect”  effects  are  written  in  other  formulae.  It  seems 
plausible  that  this  resembles  how  we  tend  to  describe  the  effects  of  actions  in 
common-sense  terms.  From  an  operational  point  of  view  there  are  several  advan¬ 
tages  with  such  a  reformulation:  the  action  laws  will  be  more  easily  legible,  and 
circumstances  which  are  shared  between  several  action  laws  can  be  stated  just 
once  rather  than  repeatedly  inside  each  law.  I  shall  use  the  terms  mam  action 
taw  and  supplementary  laws  for  the  two  types  of  statements. 

The  main  action  law  will  be  assumed  to  have  the  same  syntactical  form  as  was 
exemplified  for  plain  action  laws  above.  In  particular  it  defines  a  certain  number 
of  features  which  are  affected  by  the  action,  and  (constraints  on)  their  values 
during  and  at  the  end  of  the  action.  On  this  assumption  the  supplementary  laws 
serve  to  characterize  three  phenomena: 

-  Surprises:  Unusual  changes  in  additional  features  besides  tiiose  stated  in  the 
main  law. 

-  Ramification:  Normal  changes  in  additional  features,  for  example  due  to 
cause-effect  chains. 

-  Qualification:  Unusual  exceptions  from  the  main  law,  whereby  some  of  the 
features  change  in  another  fa  hion  (including  the  case  of  no  change  at  all) 
than  what  is  specified  in  the  main  law. 

Clearly  there  is  a  certain  arbitrariness  as  to  the  separation  between  main  law 
and  supplementary  laws,  and  the.sc  three  categories  are  only  well  defined  for  a 
given  choice  of  main  laws. 

I  distinguish,  therefore,  between  the  basic  issue  of  inertia  who.se  analy.sis  wa.s 
outlined  in  the  previous  section,  and  the  three  modulari/at ion-orienti'd  issms 
of  surprises,  ramification,  and  qualification.  These  are  essentially  the  same  as 
the  traditionally  recognized  cispects  of  tlie  “frame  problem”.  .My  claim  here  is 
that  Ihe  traditional  frame  problem  can  be  analyzed  rigorously  in  terms  of  simple 
inertia  (the  /C— lA  class  of  reasoning  problems,  as  defined  in  the  previous  section) 
plus  the  decomposition  of  action  laws  into  mam  action  laws  and  supplementary 
laws. 

The  division  that  is  used  here  differs  somewhat  from  the  usual  one  in  the 
literature,  since  surprises  are  usually  considered  as  an  aspect  of  the  “inertia 
problem”  or  of  the  “ramification  problem”,  and  have  not  been  identified  as  a 
separate  issue  in  its  own  right.  Apart  from  that,  the  distinction  between  the 
problems  of  inertia,  ramification,  and  qualification  is  the  generally  accepted  one. 


3.2  The  systematic  perspective 

In  order  to  analyze  the  distinction  between  main  action  laws  and  suiiplcmentary 
laws,  it  is  not  sufficient  to  rewrite  each  action  law  to  a  classically  equivalent 
formula.  Two  other  and  stronger  methods  will  be  used; 


566 


-  Premise  transformation:  Transformations  on  tlie  whole  clironicle,  i.e.  on  the 
sets  of  premises  including  transfer  of  information  between  the  premise  par¬ 
titions,  in  ways  which  do  not  change  the  set  of  selected  models. 

—  Extensions  of  the  semantics:  Modifications  of  the  underlying  semantics,  and 
corresponding  addition  of  more  premise  partitions  as  well  as  definition  of 
entailment  criteria  that  use  the  additional  partitions. 

The  shift  towards  common-sense-motivated  forms  for  action  laws  is  performed 
in  the  first  case  by  rewriting  the  specific  set  of  premises  to  an  equivalent  form, 
and  in  the  second  case  by  changing  the  logic  itself. 

Surprises  and  qualifications  involve  notions  of  genuine  preferences  and  de¬ 
faults:  in  a  choice  between  a  model  that  involves  a  surprise  and  one  that  does 
not,  one  should  prefer  the  latter.  They  can  therefore  not  be  accomodated  in 
the  trajectory  semantics  for  /C— lA,  so  an  extension  of  the  semantics  as  well  as 
additional  specialities  are  needed.  I  will  identify  the  phenomenon  of  sur|>rises 
with  the  speciality  code  S,  and  analyze  qualifications  in  terms  of  a  concept  of 
normalcy  which  has  the  speciality  code  N. 

Ramifications  offer  a  more  complex  picture.  One  jiossible  viowjioint  is  to 
exclude  ramifications  from  the  underlying  semantics,  'fhe  trajectory-semantics 
definition  of  the  world  is  retained  like  for  Ai^— lA,  where  it  is  assumed  that  the 
trajectories  for  a  given  action  and  starting  state  characterize  all  changes  of  fea¬ 
ture  values.  The  ramification  problem  is  then  only  an  issue  about  how  a  given 
world  description,  in  trajectory  semantics,  can  be  correctly  expre.ssed  using  a 
combination  of  main  action  laws  and  supplementary  laws.  In  such  a  semantics- 
preserving  approach  to  ramification,  premise  transformations  are  the  natural 
instrument. 

The  other  possible  viewpoint  is  that  the  world  description  of  the  form  (Infl. 
Trajs)  that  is  used  for  AC— lA  shall  only  correspond  to  the  main  action  laws, 
and  that  some  other  device  is  needed  in  the  underlying  semantics  correspond¬ 
ing  to  the  supplementary  laws.  This  means  in  particular  that  the  definition  of 
the  “game”  between  ego  and  world  becomes  more  complex.  The  world’s  move.s 
can  no  longer  be  described  just  in  terms  of  selecting  and  imposing  one  tra¬ 
jectory;  there  must  also  be  some  update  policy  about  how  to  modify  the  finite 
history  with  secondary  changes  corresponding  to  the  supplementary  laws.  This 
semantics-extending  view  of  ramification  opens  a  number  of  possibilities,  and 
several  additional  specialities  are  required  to  account  for  ramification  in  this 
view. 

3.3  Recycling  vs.  embedding  of  entailment  criteria 

Since  an  extension  of  the  underlying  semantics  will  be  neeiled  for  surprises,  (pial- 
ification,  and  some  approaches  to  ramification,  wo  shall  rcpi'atedly  encounter 
situations  where  a  number  of  entailment  criteria  have  been  defined  and  ana¬ 
lyzed  for  a  family  Z  of  reasoning  problems,  and  similar  criteria  are  needed  for  a 
larger  family  Z'  D  Z.  For  example,  we  may  wish  to  use  our  entailment  criteria 
for  AC-IA  as  a  basis  for  criteria  for  AC— IAS.  There  are  two  possible  ways  of 
proceeding: 


567 


-  To  check  whether  some  of  tlie  known  methods  for  Z  will  apply  also  to  the 
extended  family,  or  at  least  to  some  part  of  it  which  can  be  defined  using  a 
sub-speciality,  f'or  example,  maybe  some  method  for  /C-IA  will  also  work 
for  a  family  /C— lASx  for  some  sub-speciality  x  under  the  speciality  S.  The 
existing  criterion  is  recycled^  to  apply  for  the  additional  speciality. 

—  To  define  a  more  complex  entailment  criterion,  for  example  by  introducing 
an  additional  premise  category  or  an  additional  preference  relation.  The 
expression  for  the  selected  models  according  to  the  new  criterion  contains 
the  model-selection  expression  for  the  old  criterion  as  a  sub-expression.  The 
new  criterion  is  formed  as  an  embedding  of  the  existing  one. 

As  a  rule  of  thumb,  the  use  of  recycling  ought  to  be  preferred  since  it  avoids 
the  need  for  introducing  more  complexity  in  entailment  criteria.  However  it  is 
not  always  possible  to  proceed  by  recycling,  so  embedding  may  sometimes  be 
the  only  possibility  besides  inventing  an  entirely  new  entailment  criterion.  If  and 
when  entailment  methods  are  chosen  on  the  basis  of  representative  examples,  one 
must  be  particularly  careful  not  to  choose  a  recycled  criterion  merely  because  it 
works  for  a  few  examples  in  the  extended  family. 

4  Brief  overview  of  assessments  for  surprises, 
ramification,  and  qualification 

4.1  Surprises 

It  is  clear  that  surprises  should  be  minimized  non-chronologically,  and  that  it  is 
the  cardinality  or  the  combined  weight  of  the  surprises  that  is  to  be  minimized 
rather  than  the  set  of  surprises.  The  only  exception  is  that  if  one  can  iissume 
that  there  is  at  most  one  surprise,  then  of  course  minimization  of  cardinality 
and  of  set  extension  is  equivalent. 

Since  the  minimization  is  to  be  done  non-chronologically,  then  the  recycling  of 
approaches  for  K—IA  to  apply  for  AC— IAS  will  only  work  for  non-chronological 
minimization  methods.  However  such  methods  have  an  extremely  limited  range 
of  applicability.  In  practice  it  is  therefore  only  of  interest  to  use  embedding 
approaches,  where  one  makes  a  separate  minimization  of  the  surprises  included 
in  the  models,  thereby  increasing  by  one  the  number  of  selection  functions  that 
occur  in  the  entailment  criterion. 

4.2  Ramification 

The  “ramification  problem”  has  been  defined  from  the  point  of  view  of  common- 
sense-motivated  formulations  for  action  laws,  and  concerns  how  one  shall  take 
care  of  all  the  normal,  indirect  effects  of  an  action,  where  the  direct  effects  of 
the  action  are  by  definition  those  that  are  expressed  using  the  main  action  law. 
I  will  recognize  three  types  of  phenomena  within  this  general  definition: 

*  The  term  “extended”  would  also  have  been  appropriate,  but  is  already  used  for  other 
purposes  in  non-monotonic  logics. 


568 


-  Changes  which  occur  after  the  termination  of  the  action,  cis  a  result  of  cause- 
effect  chains  with  delays.  This  case  will  be  counted  to  the  L  s|)eciality  for 
delayed  effects,  and  is  not  being  considered  in  this  paper. 

-  Changes  in  features  of  objects  which  are  not  included  as  arguments  of  the 
action.  This  phenomenon  will  be  referred  to  as  dependencies  or  structural 
ramification.  The  full  definition  of  the  semantics  for  A.i— lA  makes  an  e.x- 
plicit  assumption  that  there  are  no  dependencies,  and  leaves  the  case  with 
dependencies  to  the  broader  family  AC— lAD  of  reasoning  problems.  Depen¬ 
dencies  will  not  be  further  discussed  here. 

-  Changes  in  additional  features  of  objects  which  are  included  as  arguments 
of  the  action,  including  changes  in  features  without  arguments.  This  will  be 
considered  as  the  speciality  of  local  ramification,  named  by  the  speciality 
code  U,  and  is  the  topic  of  the  present  subsection. 

The  concrete  treatment  of  ramification  in  the  A. I.  literature  is  restricted  to  the 
case  of  local  ramification.  The  distinction  between  stru  tural  and  local  ramifica¬ 
tion  is  quite  important  in  the  underlying  semantics,  since  the  proofs  for  assess¬ 
ment  of  currently  used  entailmcnt  criteria  depend  critically  on  the  assumption 
that  there  are  no  dependencies. 

One  possible  way  of  motivating  ramification  and  of  formulating  supplemen¬ 
tary  laws  is  in  terms  of  constraints  i.e.  propositions  which  are  supposed  to  hold 
at  all  times.  Constraints  may  have  the  form 
Q  K 

where  Q  represents  “always”  and  /c  is  a  fluent  formula  with  certain  syntactic 
restrictions,  expressing  a  condition  that  must  hold  momentarily  at  each  time- 
point  and  which  does  not  refer  to  previous  timepoints  or  to  current  change. 
Constraints  of  that  form  will  be  said  to  be  synchronic.  In  the  existing  literature, 
most  authors  identify  ramification  with  the  use  of  synchronic  constraints. 


Semantics-preserving  approaches  accomodate  constraints  on  the  syntactic 
level,  but  without  changing  the  underlying  semantics  or  the  entailment  criteria. 
Then  the  constraints  do  not  define  any  additional  ontological  .speciality.  Let  us 
consider  this  possibility  in  the  basic  case  of  the  /C— lA  ontological  family,  with 
trajectory  semantics.  The  underlying  semantics  defines  the  world  in  terms  of  a 
pair  (Inll,Trajs).  If  the  underlying  semantics  is  unchanged,  the  trajectories 
in  Trajs(A,r)  still  represent  all  changes  of  feature  values  when  an  action  A  is 
performed  from  the  starting  state  r.  Even  within  this  rigid  framework,  there  are 
two  recisonable  uses  for  constraints.  Active  constraints  represent  an  approach  to 
ramification;  passive  constraints  actually  have  nothing  to  do  with  ramification 
and  are  instead  a  kind  of  derived  invariants. 

In  the  context  of  ramification,  the  purpose  of  constraint  formulae  which  are 
used  for  active  constraints  is  as  an  economy  measure  when  writing  out  action 
laws.  Suppose  one  is  given  a  AC— lA  world  where  the  discrete  state  domain  7v/ 
contains  only  a  subset  of  all  the  feature  assignments  that  are  allowed  by  the 
logic.  For  example,  if  a  represents  “it  is  dead”  and  d  represents  “it  is  alive”, 
then  a  feature  assignment  where  both  a  and  d  are  assigned  the  value  T  is  not 


569 


included  in  the  set  72/  -  it  is  not  considered  to  ever  occur.  The  constraint  Ok 
characterizes  this  state  of  affairs  precisely  if  72/  equals  the  model  set  for  k. 

In  such  a  case,  all  trajectories  in  Trajs(>I,r)  will  be  consistent  with  the 
constraints.  The  full  trajectory  normal  form  for  the  world  has  been  assumed 
to  enumerate  all  changes  explicitly,  but  one  can  also  consider  an  equivalent  set 
of  premises  consisting  of  maximally  simple  action  laws,  which  are  intended  to 
be  complemented  by  the  separate  statement  of  the  constraints.  In  each  action 
law  the  consequent  will  just  consist  of  the  appropriate  occlusion  statements^  for 
all  /  in  Inll(i4,r‘),  combined  with  the  minimal  amount  of  other  information 
about  the  action’s  consequences.  The  additional  constraints  on  the  values  of  the 
occluded  possibly  changing)  fluents  is  provided  by  the  supplementary  laws 
which  are  constraint  formulae. 


Semantics-extending  approaches  may  deal  with  constraints  in  a  manner 
similar  to  surprises.  The  basic  logic  for  inertia,  i.e.  for  the  ontological  family 
AC— lA,  defines  in  a  very  restrictive  way  what  changes  are  possible,  but  if  these 
definitions  are  applied  without  exception  then  one  obtains  a  contradiction.  In 
the  case  of  surprises  it  was  the  observations  that  generated  a  cont  radiction  and 
a  need  for  escape;  in  the  case  of  ramification  it  is  instead  the  constraints  that 
create  this  need. 

Using  again  the  “dead  xor  alive”  constraint 
D  (a  o  -id) 

for  an  example,  suppose  one  has  also  the  observation 
[10]  a  A  -id, 

saying  that  a  is  true  and  d  is  false  at  time  10,  as  well  as  the  action  statement 
110,121  Kill, 

saying  that  the  “Kill”  action  takes  place  over  the  interval  from  time  10  to  time 
12,  and  the  action  law 

Is,  tl  Kill  ^  [s,  t]a  ;=:  F 

which  does  not  imply  any  possibility  of  change  in  d.  Under  the  assumptions  of 
AC-IA  one  will  obtain  an  empty  intended-model  set,  and  an  additional  device 
in  the  underlying  semantics  is  required  in  order  to  save  the  situation. 

The  presently  most  popular  choice  of  that  additional  device  is  to  use  “min¬ 
imization  of  change”  or  more  precisely  the  minimization  of  the  set  of  changes. 
However  a  precise  analysis  shows  that  this  update  policy  has  serious  flaws  which 
seem  impossible  to  repair  except  by  very  drastic  changes  to  the  underlying  logic. 

4.3  Normality 

The  speciality  of  normality  concerns  what  is  otherwise  known  as  the  qualification 
problem,  i.e.  the  cases  where  some  of  the  changes  that  are  stated  in  the  main 
action  law  do  not  take  place  or  take  place  differently  than  stated  there.  The  cur¬ 
rent  A. I.  literature  only  contains  one  concrete  reasoning  example  of  qualification, 

^  Statements  about  the  possibility  of  change,  whose  only  purpose  is  to  override  the 
default  of  inertia. 


570 


namely  the  “potato  in  the  tailpipe”  example,  and  surprisingly  few  proposals  for 
how  to  deal  with  the  problem.  Also  it  is  quite  easy  to  find  countere.vamplcs  to 
those  approaches,  such  as  the  “tailpipe  marauder”  example. 

At  this  point  I  can  not  report  any  positive  results  about  assessed  entailrnent 
methods  for  the  speciality  of  normality.  However  the  underlying  semantics  and 
the  other  aspects  of  the  systematic  methodology  has  at  least  made  it  possible  to 
define  the  qualification  problem  in  more  precise  terms  than  before. 

5  Conclusion 

For  many  years,  the  research  on  common  sense  reasoning  has  been  troubled  by 
the  choice  between  two  evils.  A  lot  of  the  reported  research  has  been  formal,  pre¬ 
cise,  but  apparently  of  little  relevance  to  actual  common-sense  reasoning  prob¬ 
lems.  A  lot  of  other  research  has  addressed  actual  problems,  but  using  a  very  ad 
hoc  methodology  in  particular  by  relying  too  much  on  representative  examples 
of  common-sense  reasoning. 

With  the  systematic  metliodology  which  is  now  emerging  foi  ihis  rescavch,  it 
becomes  possible  at  last  to  make  precise  and  formally  proven  statements  about 
whole  classes  of  problems  and  about  the  general  applicability  of  logic  variants, 
and  not  merely  about  single  instances  of  reasoning  problems.  As  one  would 
expect  in  a  formal  approach  to  any  problem  area,  the  systematic  methodology 
starts  by  analyzing  a  relatively  simple  case  in  deptli,  namely  the  AC— lA  class  of 
reasoning  problems,  and  then  builds  on  it  in  order  to  analyze  succe.ssi  vely  broader 
and  more  useful  classes  of  problems.  In  this  way  we  can  gradually  address  the 
plethora  of  logical  complexities  which  have  been  assembled  under  the  term  “the 
frame  problem”,  but  with  a  firm  and  reliable  basis  at  each  step  in  the  analysi.s. 

References 

[GN87]  Michael  R.  Genesereth  and  Nils  J.  Nilsson.  Logical  Foundations  of  Arlifictal 
Intelligence.  Morgan-Kaufmann  Publishing  Co.,  1987. 

[Lif9l]  Vladimir  Lifschitz.  Toward  a  metatheory  of  action.  In  International  ConJ.  on 
Knowledge  Representation  and  Reasoning,  pages  376-386,  1991. 

[LS91]  Fangzhen  Lin  and  Yoav  Shoham.  Provably  correct  theories  of  action  (prelim¬ 
inary  report).  In  National  (U.S.)  Conference  on  Artificial  Intelligence,  pages 
349-354,  1991. 

[Rei91]  Ray  Reiter.  The  frame  problem  in  the  situation  calculus;  a  simple  solution 
(sometimes)  and  a  completeness  result  for  goal  regression.  In  Vladimir  Lifs¬ 
chitz,  editor.  Artificial  Intelligence  and  Mathematical  Theory  of  Compututiun, 
pages  359-380.  Academic  Press,  1991. 

[San92]  Erik  Sandewall.  Features  and  fluents.  Review  version  of  1992.  Technical  Re¬ 
port  LiTH-IDA-R-92-30,  Linkoping  University,  Department  of  Computer  and 
Information  Science,  1992. 

[San93]  Erik  Sandewall.  The  range  of  applicability  of  nonmonotonic  logics  for  the 
inertia  problem.  In  International  Joint  Conference  on  Artificial  Intelligence. 
1993. 


GGD:  Graph  Grammar  Developer 
for  features  in  CAD/CAM 


Christoph  Klauck  and  Johannes  Schwagereit 

German  Research  Center  for  Anificial  Intelligence  Inc.  (DFKl),  ARC-TEC  Project 
Mailing  address:  P.O.  Box  2080,  D-6750  Kaiserslautern 
Telephone:  -(--(-49-631-205-3477,  E-mail:  klauck@dfki.uni-kl.de 


Abstract.  To  integrate  CA*-systems  with  other  applications  in  the 
world  of  CIM,  one  principal  approach  currently  under  devdopment  is 
based  on  feature  representation.  It  enables  any  CIM  component  to  rec¬ 
ognize  the  higher-level  entities  -  the  so-called  features  -  out  of  a  lower- 
data  exchange  format,  which  might  be  the  internal  representation  of  a 
CAD  system  as  well  as  some  standard  data  exchange  format.  In  this 
paper  we  present  a  ’made-to-measure’  editor  for  representing  features  in 
the  higher-level  domain  specific  rep.esentation  language  FEAT-REP  -  a 
representation  language  based  on  a  (feature-)  specific  attributed  node  la¬ 
beled  graph  grammar.  This  intelligent  tool,  shortly  called  GGD,  supports 
the  knowledge  engineer  during  the  representation  process  by  structuring 
the  knowledge  base  using  a  conceptual  language  and  by  verifying  several 
characteristics  of  the  features. 


1  Motivation 

Research  in  feature-based  CA’'‘-systems  like  Computer  Aided  Design  (CAD),  or 
Computer  Aided  Process  Planning  (CAPP),  has  been  motivated  by  the  under¬ 
standing  that  geometric  models  represent  a  workpiece  in  greater  detail  than  it 
can  be  utilized  e.g.  by  a  designer  or  process  planner.  When  CA*-experts  look 
at  a  workpiece,  they  perceive  it  in  terms  of  their  own  expertise  -  the  so-called 
features.  Features  are  domain-  and  company-specific  description  elements  based 
on  the  geometrical  and  technological  data  of  a  workpiece  that  an  expert  in  a 
domain  associates  with  certain  informations  [2].  They  are  build  upon  a  syn¬ 
tax  (shape  description;  geometry  and  technology,  given  here  by  productions  of  a 
graph  grammar)  and  a  semantics  (description  of  related  informations,  e.g.  skele¬ 
tal  plans  in  manufacturing  or  functional  relations  in  design)  and  they  provide  an 
abstraction  mechanism  to  facilitate  e  g.  the  creation,  manufacturing  or  analysis 
of  workpieces  or  more  general  to  bridge  the  gap  between  several  systems  in  the 
world  of  Computer  Integrated  Manufacturing  (CIM).  Features  that  are  required 
e.g.  for  design  may  differ  considerably  from  those  required  e.g.  for  manufactur¬ 
ing  or  assembly,  even  though  they  may  be  based  on  the  same  geometric  and 
technological  entities  [6]. 

So  representing  features  is  one  necessary  step  to  bridge  the  gap  between 
several  CA*-systems  and  an  important  step  towards  truly  Computer  Integrated 
Manufacturing.  The  expected  advantages  of  a  close  coupling  of  CA'^-systems  are; 


572 


The  information  interchange  shall  lead  to  a  better  knowledge  transfer,  to  shorter 
turnaround  times  and  to  improved  feedback.  At  the  end,  higher  flexibility  and 
generally  better  results  are  expected. 

In  current  research  one  method  to  represent  features  is  based  on  graph  gram¬ 
mars  (cf.  [3,  6,  15]).  This  area  is  a  well  established  field  of  research  and  provides 
a  powerful  set  of  methods  like  parsing,  and  knowledge  about  problems,  their 
complexity  and  how  they  could  be  solved  efficiently  [5].  So  in  consideration  of 
the  feature  characteristics  ’made-to-measure’  tools  must  be  developed  to  make 
the  recognition  and  representation  process  more  efficient. 


From  this  point  of  view  we  present  in  this  paper  an  implementation  of  the 
high  level  domain-specific  feature  representation  language  FEAT-REP  [lOj.  This 
implementation  is  realized  by  the  Graph  Grammar  Developer  (GGD)  -  an  in¬ 
telligent  tool  to  support  users  of  FEAT-REP  to  fill  the  knowledge  base  with 
definitions  of  features. 

2  What  are  Features  ? 

To  become  more  familiar  with  the  effect  of  feature  characteristics  to  our  rep¬ 
resentation  formalism,  we  would  like  to  introduce  briefly  the  most  important 
characteristics  of  its  descriptions.  Detailed  explanations  and  the  analogue  to 
graph  grammars  can  be  found  in  [10].  Some  of  the  most  important  syntactical 
characteristics  of  features  are; 


573 


Fig.  2.  Overlapping  of  shoulders  or  angles 


Similar  definitions:  In  an  application  the  knowledge  base  containing  the  fea¬ 
ture  descriptions  will  be  large.  Many  of  these  descriptions  will  be  similar 
to  each  other,  descriptions  for  a  single  feature  as  well  as  those  for  various 
features,  with  respect  to  a  has-parts  (e.g.  figure  1)  and  a  js-a  hierarchy. 

Component  overlapping:  Features  may  have  relations  to  features  of  different 
workpieces  (e.g.  bearing). 

Dependence  of  Dimensions:  In  dependence  of  dimensions,  the  same  struc¬ 
tures  may  be  identified  as  different  features  (e.g.  groove  and  insertion). 

EVagmentizing:  Parts  of  features  are  not  always  in  direct  neighborhood  (e.g. 
FRAGM-LONGTURN  in  figure  1). 

Ambiguity:  In  the  terminology  of  features  an  expert  often  have  different  alter¬ 
native  descriptions  for  the  same  structure  (e.g.  groove  or  pocket). 

Neighborhood:  Feature  descriptions  form  graphs  of  features  and/or  surfaces 
(see  figure  3),  where  edges  represents  the  neighborhood. 

Interaction:  Areas  of  features  can  overlap  (see  figure  2). 

Additional  characteristics  are  contextsensitivity  (e.g.  LONGTURN-OUT ,  GROUND- 

OF-GROOVE  in  figure  1),  and  defectivity. 


Fig.  3.  Neighborhoodgraphs  of  surfaces 


574 


3  Attributed  Node  Labeled  Feature  Graph  Grammars 

In  this  section  we  will  briefly  define  the  terminology  of  attributed  node  labeled 
graph  grammars  as  used  in  this  paper.  Introduction  and  survey  can  be  found  in 
more  detail  e.g.  in  [5]. 

In  our  paper  the  term  (feature-)  graph  means  an  attributed  finite  undirected 
node  labeled  graph,  in  the  sequel  shortly  called  graph.  Such  a  (feature-)  graph 
FG  is  defined  as  a  4-tupel  FG  :=  {V,  E,  S,(p),  where  K  is  a  finite  (nonempty) 
set  of  attributed  nodes,  E  C  V  x  V  is  a  set  of  undirected  edges,  E  is  a  finite 
(nonempty)  alphabet  of  node  labels  or  sorts  and  ^  is  a  labeling  function,  with 
(f  :  V  E.  Workpieces  are  represented  by  such  graphs.  The  nodes  of  a  work- 
piecegraph  represent  geometric  primitive  surfaces,  the  node  label  decode  the 
type  of  the  surface  (e.g.  cylinder  jacket),  the  attributes  carry  detailed  geometric 
and  technologic  information  (e.g.  tolerances)  and  the  edges  decode  the  topology 
of  the  workpiece,  i.e.  two  nodes  are  adjacent  if  the  corresponding  surfaces  touch 
each  other. 

An  attributed  node  label  (feature-)  graph  grammar  (ANLFGG)  is  a  4-tuple 
GG  :=■  (T,  N,  P,goal),  where  T  is  a  finite  (nonempty)  set  of  terminals,  N  is  a 
finite  (nonempty)  set  of  non-terminals,  P  is  a  finite  set  of  productions  and  goal  6 
N  is  the  start  node.  A  production  (rule)  p  €  P  is  a  4-tuple  (Ihs,  rhs,e,c)  where 
Ihs  €  A  is  a  single  node,  the  left  hand  side  of  p,  rhs  is  a  (nonempty)  (feature-) 
graph  over  TU  N,  the  right  hand  side  of  p,  e  is  an  embedding  specification  and  c 
is  a  finite  set  of  conditions  over  Ihs  and  rhs,  the  so-called  dependency  relations. 
The  conditions  or  the  so-called  constraints  c  serve  two  purposes:  First  to  proof  or 
generate  informations  by  calculating  attributes  and  second  to  lay  down  certain 
restrictions  and  attributes  given  by  a  description  of  a  feature. 

The  most  graph  graiiunar  formalisms  lire  distit.g'.!ish«*d  by  the  embedding 
specification  e.  In  our  case  we  define  e  in  that  way  that  always  an  edge  in  a 
(feature-)  graph  of  a  derivation  step  represents  the  neighborhood  of  the  two 
incident  nodes.  For  details  of  our  ANLFGG  and  the  analogue  to  features  see  [11] 
and  [10]. 

4  System  Architecture  of  GGD 

In  contrast  to  other  more  general  tools  editing  graph  grammars  (cf.  [7,  9])  the 
GGD  is  specialized  to  edit  FEAT-REP  -  the  ’made-to-meeisure’  (feature-)  graph 
grammar  formalism.  Figure  4  shows  the  most  important  components  of  GGD 
and  their  interrelations. 

The  visualization  component  is  the  graphical  user  interface  of  the  GGD. 
It  offers  the  user  an  easy  possibility  to  enter,  view  and  manipulate  definitions 
of  features  (see  figure  7).  A  part  of  this  component  is  a  text-editor  (Constraints 
for  . . . )  to  enter  conditions.  Using  the  designated  menus  all  functions  of  the 
other  components  could  be  called.  The  GGD  could  also  be  used  without  taking 
advantage  of  the  visualization  component. 

In  figure  4  and  5  the  visualization  of  a  typical  feature  is  shown.  The  user 
may  add  or  delete  nodes,  neighborhoods  and  overlaps.  For  any  of  the  nodes  the 


575 


Fig.  4.  Structure  of  the  GGD 


sort  has  to  be  given,  a  label  representing  a  second  more  specific  name  given  by 
the  user  is  optional  but  useful;  the  numbers  are  used  for  the  parser  GraPaKL  as 
a  kind  of  heuristics  to  specify  an  order  in  which  he  will  try  to  find  instances  for 
the  nodes.  So  they  may  change  during  the  lifetime  of  the  specified  production. 
The  optional  labels  support  the  descriptions  of  the  conditions  in  a  more  natural 
way  to  identify  the  nodes  in  mind.  Additional  functions  are  provided  to  close  or 
resize  windows,  or  to  move  nodes  and  edges. 

As  shown  the  features  are  entered  as  graphs,  which  is  a  very  abstract  way  to 
represent  features.  To  give  a  more  vivid  illustration  we  currently  develop  a  tool 
to  show  features  as  they  would  look  cus  part  of  a  workpiece.  A  first  prototype  is 
shown  in  figure  7. 

Figure  5  shows  the  feature  Pocket  and  the  corresponding  features  as  they 
are  represented  by  the  GGD.  Not  shown  in  this  illustration  are  any  conditions 
belonging  to  the  rules. 


Fig.  5.  The  rules  forming  a  pocket 


Tlie  representation  coinpoiienl  stores  the  entire  knowledge.  Several  fiinc- 
fions  are  provided  for  access  and  modification  of  the  (feature-)  graph  grammar 
Integrated  in  this  component  is  a  concept  language  based  on  KL-ONK  (Knowl¬ 
edge  Language  ONE),  called  TAXON  [8],  which  is  developed  for  technical  do¬ 
mains. 

One  drawback  which  concept  languages  based  on  KL-ONE  have  is  that  all  the 
terminological  knowledge  has  to  be  defined  on  an  abstract  logical  level  In  many 
applications  like  ours,  one  would  like  to  be  able  to  refer  to  concrete  domains  and 
predicates  on  these  domains  when  defining  concepts.  Examples  for  such  concrete 
domains  are  the  integers,  the  real  numbers  or  also  non-arithmetic  domains,  aiul 
predicates  could  be  equality,  inequality  or  more  complex  predicates.  TAXON 
realize  a  scheme  for  integrating  such  concrete  domains  into  concept  languages 
rather  than  describing  a  particular  extension  by  some  specific  concrete  domain 
'fhe  u.sed  algorithms  such  as  subsumption,  mctaiitiatioii  and  consistency  are  not 
only  sound  but  also  complete.  They  generate  subtcisks  which  have  to  be  solved 
by  a  special  purpose  reasoner  of  the  concrete  domain  [1]. 


Fig.  6.  A  hierarchy  of  features 


TAXON  is  used  to  handle  the  feature  characteristic  of  many  similar  defi¬ 
nitions  by  defining  a  hierarchy  of  the  productions.  The  right  hand  side  of  any 
production  is  compiled  to  a  convenient  form  to  be  stored  in  TAXON.  It  was  nec¬ 
essary  to  find  a  representation  which  allows  the  concept  language  to  compute 
exactly  the  subsumption  hierarchy  we  respectively  the  expert  have  in  mind. 
This  has  to  be  done  efficient  as  the  (feature-)  graph  grammar  may  be  large.  In 
TAXON  a  production  a  subsumes  a  production  6,  if  6  is  an  expansion  of  a,  i.e. 
b  can  be  generated  out  of  a  by  inserting  nodes  into  the  right  hand  side  of  a. 

According  to  the  feature  characteristic  of  many  similar  definitions  .'AXON 
hold  to  kinds  of  hierarchy:  One  for  all  feature  definitions  and  one  for  every 
feature.  Note  that  the  latter  is  not  just  a  part  of  the  former. 

Figure  6  shows  a  simple  hierarchy  of  three  features.  Shoulder-2  and  Shoulder- 


577 


^  arc  expansions  of  Shoulder- 1 ,  Shoulder-  i  of  Shoulder- 1  aiui  Shoulder- J  Ii 
should  he  nott'd  that  our  liierarrhy  is  more  exti-nsive  than  just  a  subgraph  rela¬ 
tion.  In  future  work  the  similarity  of  conditions  will  also  be  taken  into  account. 

I'he  (i’(;D  offers  several  consistency  checks  and  verify  the  defined  graininar 
for  soundness.  'I'liis  will  be  performed  during  the  development  of  a  (maybe  new) 
(feature-)  grapli  grammar.  'I'he  tests  are  adapted  to  our  purpose,  the  aim  is  to 
prevent  the  description  of  features,  'fins  olfers  the  user  the  possibility  to  detect 
and  to  eliminate  the  most  errors  as  early  as  possible.  Some  of  the  performed 
tests  are; 

-  A  grammar  can’t  be  used  without  a  start  node.  So  it  has  to  be  checked  if  it 
has  been  defined  and  if  it  appi'ar  in  any  production  on  the  right  hand  side. 
In  the  case  of  manufacturing  or  design  features  this  should  be  a  production 
for  worhptece. 

-  Our  definition  requires  that  every  feature  graph  is  connected.  'Dierefore  the 
system  checks  if  the  productions  right  hand  side  is  connected. 

-  GGD  verifies  if  the  clefinecl  grammar  sounds  [4].  This  verification  contains 
the  following  checks: 

•  There  is  no  non-terminal  (production)  with  an  empty  right  hand  side, 

•  Every  non-terminal  can  be  expanded  to  a  terminal  graph.  (Is  there  any 
sen.sele.ss  non-terminal  *') 

•  For  every  non-terminal  or  terminal  the  start  node  can  be  expanded  to  a 
grajiii  containing  this  symbol.  (Is  there  any  unreachable  symbol  ?) 

The  GGD  performs  the  task  to  check  for  a  correct  syntax  of  the  conditions. 
A  test  is  performed  if  every  sort  used  by  a  production  and  its  conditions  is 
defined  in  the  associated  hierarchy  In  addition  GGD  proof  the  hierarchy  for 
cycle-free  definitions. 

Some  checks  are  performed  when  a  new  or  changed  production  and  its  associ.ated 
conditions  are  saved  by  the  user  to  the  knowledge  base.  The  complete  check  (see 
figure  1:  check  rulebase)  is  only  performed  on  a  request  from  the  user. 

The  FEAT-REP  compiler  has  the  capability  to  read  and  to  write  files 
of  this  specific  graph  grammar  formalism  [10].  These  files  represent  the  knowl¬ 
edge  base  containing  the  descriptions  of  features.  They  are  usable  by  programs 
for  recognizing  features  (parse)  and  also  by  programs  for  feature  based  design 
(generate). 

The  program  CJraph  F’arser  KaisersLautern  (GraPaKL,  [11])  is  a  heuristic 
driven  chart  based  parser  for  our  (feature-)  graph  grammars  ANLFGG,  adopted 
to  recognize  features  of  workpieces.  The  GraPaKL  compiler  translates  the 
data  stored  in  the  representation  component  of  GGD  to  files  processable  by 
the  GraPaKL,  say  to  its  internal  representation  formalism.  GraPaKL  realize  an 
abstraction  step  by  transforming  the  geometrical  and  technological  description 
of  a  workpiece  into  the  qualitative  level  of  the  feature  terminology.  As  result  a 
feature  structure  is  expected  (see  e.g.  figure  I). 


578 


5  Developing  a  Feature  Graph  Grammar  with  GGD 

Tlie  most  important  components  of  our  graph  grammar  ANLFGG  are  the  set 
of  productions  specifying  the  feature  definitions  and  a  hierarchy  of  sorts  where 
every  production  is  associated  to  one  sort.  If  a  production  in  the  GGD  is  defined 
without  specifying  the  associated  sort,  GGD  automatically  prompt  an  editor  for 
defining  it. 

To  develop  a  feature  graph  grammar  the  following  sequence  of  steps  is  rec¬ 
ommend  to  be  performed: 

Define  the  set  of  sorts  specifying  the  super-  and  subsort  relations.  Querying 
the  consistency  check  for  the  knowledge  base  maybe  defined  cycles  will  be 
found.  Additionally  GGD  will  point  out,  that  there  are  no  associated  pro¬ 
ductions. 

Define  the  set  of  productions.  A  copy-function  can  be  used  to  specify  simi¬ 
lar  rules.  Also  £.i  conditions  eissociated  to  a  production  have  to  be  specified. 
After  defining  a  production,  GGD  automatically  check  the  (syntactical)  cor¬ 
rectness  of  this  production.  Also  it  is  possible  to  check  the  classification  of 
this  production  by  TAXON. 

Perform  the  consistency  check  for  the  grammar.  After  defining  the  set 
of  sorts  and  the  set  of  productions,  during  4  stages  the  integrity  of  the 
knowledge  base  is  checked.  Errors  or  Warnings  are  maybe  given  by  GGD. 
Save  the  defined  grammar  in  a  FEAT-REP  file.  This  file  can  be  read  again 
by  GGD  to  modify  the  defined  grammar  or  to  generate  a  file  for  the  parser. 
So  a  knowledge  base  have  not  to  be  defined  in  one  session;  interruptions  are 
possible  even  though  some  errors  occur  during  the  previous  step.  GraPaKL 
files  should  be  saved  only  if  there  are  no  errors  in  the  knowledge  base. 

A  successful  feature  graph  grammar  provided  the  drawing  up  (as  a  kind  of  a 
knowledge  acquisition  step)  of  a  catalog  containing  the  feature  descriptions  (syn- 
ta.x  and  semantics)  in  an  informal  manner.  From  one’s  own  experience  a  typical 
sketch  of  the  described  features  make  this  step  more  easy  and  more  effective.  It 
is  important  that  this  step  is  performed  together  with  a  knowledge  engineer  or 
at  least  by  using  domain  specific  acquisition  tools  (e.g.  [13,  16]).  After  describing 
the  feature  graph  grammar  GraPaKL  is  recommend  to  be  used  for  checking  the 
knowledge  base  of  features  on  concrete  workpieces. 

6  Conclusion 

We  introduced  an  intelligent  system  to  support  the  representation  and  the  de¬ 
veloping  of  features  in  CAD/CAM.  ’Made-to-measure’  graph  grammars  are  used 
as  a  formal  foundation,  which  is  well  suited,  to  represent  the  characteristics  of 
features.  Our  tool  GGD  to  edit  our  AN LFGG’s  should  be  efficient  enough  to  han¬ 
dle  even  large  and  sophisticated  (feature-)  knowledge  bases.  The  computation 
of  hierarchies  and  the  enforcement  of  several  integrity  checks  make  an  efficient 
development  of  the  grammar  possible. 


579 


The  knowledge  representation  and  the  integration  of  TAXON  are  already 
implemented.  Until  today  this  system  is  used  by  our  CAPP-system  called  PIM 
(Planning  In  Manufacturing,  [12])  to  generate  and  maintain  the  knowledge  base 
for  manufacturing  features.  But  it  is  also  usable  as  domain  independent  editor 
for  the  specified  graph  grammar. 

Future  extension  will  be  additional  semantics  checks,  an  improved  user  in¬ 
terface  and  a  tool  to  generate  graphs  representing  workpieces.  GGD  will  also  be 
integrated  with  the  editor  V-SKEP-EDIT  [18]  to  offer  the  possibility  of  describ¬ 
ing  features  and  the  associated  skeletal  plans  in  one  session.  Also  a  visualization 
of  the  defined  features  as  shapes  will  be  generated  in  future  research.  Figure 
7  illustrate  the  today  implemented  user  interface  of  GGD.  In  one  window  the 
user  can  highlight  the  features  on  the  workpiece  recognized  by  GraPaKL  -  the 
feature  recognizer. 


lani  uar-GiwMics 


(K-ac'j  GGD 

Qr«i«ar  0«w«lciai«nt 

tm  Um*  t  1>  fciww  ' 


iZMlSOOCeDCl 
»  3  » 
Zibcnel 


PfLlHDEP.81 

s  I  s 


aiisoocBocI 

3  4  « 

Zibv«2 


ZVLI>CEP  ^ 

3  ?  » 

Zgl2 


N»w  Npoe  -  Shj 


_ j-r 

■ 

nil — u 

^  _Ll^ 

BWaHWiBHil 


lllustr«t«  tx«*pL*r 
Rul*  Op«r>«tior>S 
/  edit  5or>ts 

file  Interfsct 
Clear  Rulebaar 
StOD  GGO 


Delete  Node 


*  I  ’ 

ma 


‘el  eiiieiil  -  ■i'lU 


ZTLMER.R 

1 

ZYLWDERJ 

*  3  - 

*  2  * 

JT  SlOOUCS^lHCj 
*  1  - 


«  rV01U5_Llld(S  <X  ^EBCrCl  RADIUS. GROSS)) 

»  RM)IU5.fCCHTS  (X  ?€KNE2  RAOlUS.GROSS) ) 

«  LflEMGE  <RB5  <X  mOIU5_Ll«t(S>  (X  l«DltJ6JCCH7S>»>| 
(«  <X  2EKIC1  MDIUS_I1.EIN)  i%  ZYlt  RM)IU5.CR06S)> 

(>  (X  2CBCNE2  RPDIUS-KLEIN)  (X  2Y12  RPOXUS.OIOSS) ) 

(X  2EWNE1  I«ID1VIS_CR06S)  U  nK)C2  RADIUS.CROSS)) 

(IF  (>  G  <A6S  (>  <X  ZTtI  Z.GR05S)  (X  ZYLZ  Z.CROSS)))) 
(SUBSORT  *NUT^IOCRUNCSRJMC) 

(SUBSORT  'NUT J«.L(£fClN)) 


ZTLPCS  B| 
«  2  » 
Zglb 


UDC 
«  4  « 
ipml] 


New  Node  -  S^>ow  h<J 


Fig. 7.  User  interface  of  GGD 


Currently  GGD  was  used  by  a  mechanical  engineer  to  specify  design  features 
[17].  The  training  period  takes  about  one  week.  No  special  knowledge  about 
TAXON  and  the  semantics  checks  was  needed.  Just  the  syntax  of  the  language 
to  specify  the  conditions  of  the  features  {Constraints  for  . .  .Window)  which  is 
like  COMMON-Lisp  takes  a  little  bit  time  to  learn. 


580 


7  ACKNOWLEDGEMENTS 

This  research  was  performed  during  the  ARC-TEC  project  (Acquisition,  Com¬ 
pilation  and  Representation  of  TEChnical  knowledge)  at  the  German  Research 

Center  for  Artificial  Intelligence  Inc.  (DFKI).  It  has  been  supported  by  grant 

from  The  Federal  Ministry  for  Research  and  Technology  (FKZ  lTW-8902  C4). 

References 

1.  Baader,  F.  et  al:  A  Scheme  for  Integrating  Concrete  Domains  into  Concept  Lan¬ 
guages. ia:  12th  IJCAI,  pp.  452-457,  1991. 

2.  Chang,  T.-C.:  Expert  Process  Planning  for  Manufacturing.,  Addison- Wesley,  1990. 

3.  Chuang,  S.-H.  et  al:  Compound  Feature  Recognition  by  Web  Grammar  Parsing.,  in: 
Research  in  Engineering  Design,  Springer- Verlag,  pp.  147-158,  1991. 

4.  Drobot,  V.:  Formal  Languages  and  .Automata  Theory.,  Computer  Science  Press, 
Freeman  &  Co.,  1989. 

5.  Ehrig,  H.  et  al:  Graph  Grammars  and  Their  Application  to  Computer  Science.,  1th 
-  4th  International  Workshop,  Springer  Verlag,  LNCS  73,  153,  291,  532,  1979-1990. 

6.  Finger,  S.  et  al:  Parsing  Features  in  Solid  Geometric  Models.,  in:  ECAI’OO,  pp. 
566-572,  1990. 

7.  Goettler,  H.:  Graphgrammatiken  in  der  Softwaretechnik.,  Informatik  Fachberichle 
178,  Springer- Verlag,  1989. 

8.  Hanschke,  P.  et  al:  TAXON:  A  Concept  Language  with  Concrete  Domains,  in: 
PDK’91,  Springer- Verlag,  LNAI  567,  pp.  411-413,  1991. 

9.  Himsolt,  M.:  An  Interactive  Tool  for  Developing  Graph  Grammars,  in:  4th  Interna¬ 
tional  Workshop  of  [5],  pp.  61-65,  1990. 

10.  Ktauck,  Ch.  et  al:  FEAT-REP:  Representing  Features  in  CAD/CAM.,  in:  IV  ISAI: 
Applications  in  Informatics,  pp.  158-168,  1991. 

11.  Kiauck,  Ch.  et  al:  A  Heuristic  Driven  Parser  Based  on  Graph  Grammars  for  Fea¬ 
ture  Recognition  in  CIM ,  in:  SSPR'92,  pp.  200-210,  1992. 

12.  Kiauck,  Ch.  et  al:  Heuristic  Classification  for  Automated  CAPP.,  in:  11th  AAI- 
XP93,  pp.  forthcoming,  SPIE,  1993. 

13.  Kuehn,  O.  et  al:  Integrated  Knowledge  Acquisition  from  text,  previously  solved  cases 
and  expert  memories.,  in:  Applied  Artificial  Intelligence,  vol.  5,  pp.  311-337,  1991. 

14.  Mullins,  S.  et  al:  Grammatical  Approaches  to  Engineering  Design,  Part  I:  An  In¬ 
troduction  and  Commentary.,  in:  Research  in  Engineering  Design,  Springer- Verlag, 
pp.  121-135,  1991. 

15.  Rinderle,  R.:  Grammatical  Approaches  to  Engineering  Design,  Part  II:  Melding 
Configuration  and  Parametric  Design  Using  Attribute  Grammars.,  in:  Research  in 
Engineering  Design,  Springer- Verlag,  pp.  137-146,  1991. 

16.  Schmidt,  G.:  Knowledge  Acquisition  form  Text  in  a  Complex  Domain.,  in:  5th 
lEA/AlE-92,  pp.  529-538,  1992. 

17.  Schulte,  M.  et  al:  Recognition  of  Design  Features  from  Product  Models.,  in:  9th 
ICED’93,  pp.  forthcoming,  1993. 

18.  Wu,  Z.  et  al:  Skeletal  Plans  Reuse:  A  Restricted  Conceptual  Graph  Classification 
Approach.,  in:  7th  AWCG,  pp.  142-152,  1992. 


A  ICnowJ].ecl96— AjE>i>xroac3li  to 
Gxroup  An.£i Xv'S  J- s  J.ri.  Automated 
Mfinu  fac:  t u t X Sv’^X.oms 

Kesheng  Wang 

Department  of  Production  Engineering 
The  Norvegian  Institute  of  Technology 
N-7034  Trondheim,  Norway 


AB8TMCT  Knowl«dg«-baaad  tflchnlquaa  hAa  bMn  «ppll«l  to  ftany  flolda,  including  tho  doslgn  «nd  aAnagaMnt  of 
MnufACtujTlng  aYatoM.  Thia  papor  oaphaalzoa  on  how  to  build  «  knowladge-baaad  ayatoa  (XBGA3}  by  tb#  uao  of 
XPLAIR  prograMdng  tool,  for  group  analyaia  In  autcmatod  Mnufacturing  ayatoM.  Cluataring  algorltba 
followa  Kualak'a  aachod.  Thla  approa  •.  >  auitabla  for  Ihtagratiog  aodala  and  algorlthaa  which  hava  baan 
dawelopod  in  production  anginaarlng  with  a  knowladga  baaad  ayataa.  Tha  particular  ayataa  (KBGhd)  provldaa  a 
usar-friandly  Intarfaca  and  can  ba  oparatad  without  any  apaclal  coaputar  kncarladga  or  knowladga  of 
artificial  Intalllganca. 


1.  INTRODUCTION 

Knowledge-based  systems  have  received  considerable  attention 
during  the  last  few  years.  This  interest  stems  from  the 
recognition  that  technical  knowledge  in  a  flexibly  automated, 
computer- Integrated  manufacturing  systems  increasingly 
developing  into  a  prominent  factor  of  producti.^,-\.  Computer- 
integrated  manufacturing  (CIM)  embraces  both  the  technological 
as  well  as  the  administrative  information  flow.  Data-produclng 
and  data-processing  machines  are  included  in  a  continuous  flow 
of  information  in  order  to  represent  information-needing 
processes  in  a  transparent,  available  and  nonredundant  manner. 
Computer-aided  design,  planning,  control,  manufacturing  and 
quality  assurance,  as  well  as  knowledge-based  systems  are  to  be 
linked  into  this  information  flow  (14). 

Group  Technology  (GT)  is  a  philosophy  for  cellular  manufacturing 
which  has  been  given  especially  consideration  in  an  automated 
manufacturing  system.  Group  analysis,  a  subtask  of  the  group 
technological  approach,  is  here  understood  as  the  task  of 
partitioning  the  machines  of  a  factory  into  Independent  groups, 
and  identifying  exceptional  machines  and  products  which  prevent 
the  task  from  otherwise  being  completed  successfully  (4).  Group 
analysis  alms  at  organizing  the  production  system  so  that  product 
traffic  between  different  groups,  or  cells,  is  minimal.  While 
having  several  advances  in  comparison  to  the  use  of  a  functional 
layout,  group  technology  may  also  cause  problem  such  as  workload 
Imbalance  (13). 

The  group  analysis  in  group  technology  in  automated  manufacturing 
systems  can  be  loosely  formulated  as  follows:  determine  machine 
cells  and  part  families  with  minimum  number  of  parts  that  visit 
more  than  one  machine  cell,  and  select  a  suitable  material 
handing  carrier  with  the  minimum  corresponding  cost  subject  to 
the  following  constraints: 

1.  Processing  time  available  at  each  machine  is  not  exceeded. 

2.  Upper  limit  on  the  frequency  of  trips  of  material  handling 
carriers  for  each  machine  cell  is  not  exceeded. 


582 


3.  Number  of  machines  in  each  machine  cell  does  not  exceed  Its 
upper  limit  or,  alternatively,  the  dimension  (e.g.,  the 
length)  of  each  machine  cell  is  not  exceeded. 

Several  algorithms  and  programs  for  performing  group  analysis 
have  been  developed.  The  simple  method  suggested  by  Falster  (6) 
finds  group  in  a  somewhat  unrealistic  case  where  the  groups 
already  exist  without  any  modifications  to  the  production  system. 
The  graph-theory  approach  performs  the  group  partition  by  using 
clique-finding  algorithm.  ( 11 )  Some  methods,  e.g.,  the  inter  class 
traffic  minimization  method  attempt  to  find  the  partition  so  that 
some  parts  are  still  manufactured  outside  their  associated 
groups, (7)  while  other  methods  leave  some  parts  and  machines 
completely  outside  the  partition. (8)  Further,  the  partition  into 
groups  allow  some  parts  outside  their  "own"  cells,  as  well  as  by 
introducing  new  machines. (9) 

The  algorithm  and  formulation  of  the  GT  problem  is  not  only 
computationally  complex,  but  also  involves  constraints  that  are 
difficult  to  handle  by  any  algorithm  alone.  Therefor,  we  promote 
the  use  of  a  knowledge-based  system  in  the  implementation  of 
production  system  design  tools. 

XPLAIN  is  a  programming  tool  available  on  UNIX  workstation  using 
X-window  system.  It  has  been  developed  during  the  last  four  years 
at  SINTEF-NTH.  XPLAIN  has  been  used  to  develop  several  commercial 
applications  to  offshore  industry. (5) ( 15)  We  have  also  used 
XPLAIN  to  develop  welding  expert  system, (2)  knowledge-based 
computer-aided  process  planning  system. ( 10) ( 12 )  During  the 
implementation  of  XPLAIN,  we  find  that  XPLAIN  is  quit  suitable 
for  the  purpose  of  integrating  of  expert  system  with  industrial 
environments.  In  this  paper  we  show  how  a  Knowledge-Based  Group 
Analysis  System  (KBGAS)  is  developed  by  use  of  XPLAIN,  a 
programming  tool,  in  a  flexible  automated  manufacturing  system. 


2.  SYSTEM  ARCHITECTURE 

The  Knowledge-Based  Group  Analysis  System  (KBGAS)  architecture, 
including  database,  knowledge  base,  model  and  algorithm  base, 
inference  engine,  explanation  module,  knowledge  acquisition 
module,  user  interface,  is  shown  in  Fig.  1. 

2 . 1  Input  Data 

The  input  data  required  by  KBGAS  fall  into  two  categories: 

-  Machine  data 

-  Part  data 

In  addition  to  these  data,  depending  on  the  characteristics  of 
the  manufacturing  system,  the  following  optional  data  can  be 
provided: 

-  Maximum  number  of  machines  in  a  machine  cell 

-  Maximum  frequency  of  trips  that  can  be  handled  by  a 
material  handling  carrier  (for  example,  AGV  and  robots) 


583 


2.2  Grouping  Process 

Prior  to  the  beginning  of  the  grouping  process,  KBGAS  constructs 
a  machine-part  incidence  matrix  based  on  the  data  provided  by  the 
user. 


I  User  iMfrfacg  | 


I  Usff  I 


Figure  1 .  System  architecture 

Next,  the  KBGAS  initialized  in  the  data  base;  objects 
representing  facts  known  about  the  manufacturing  system  are 
considered.  Then  ths  system  forms  machine  cells  and  the 
corresponding  part  fai..  l^es.  Each  machine  cell  is  formed  by 
including  one  machine  at  a  time.  A  machine  is  first  analyzed  for 
the  possibility  to  inclusion  in  machine  cell.  For  example,  a 
bottleneck  machine,  i.e.,  a  machine  that  processes  parts  visiting 
more  than  one  machine  cell,  is  not  included. 

Each  time  a  machine  cell  has  been  formed,  the  KBGAS  checks 
whether  the  constraints  1-3  have  been  violated  and  removes  all 
parts  violating  the  constraints. 

For  a  machine  cell  that  has  been  formed  and  analyzed  by  KBGAS, 
the  corresponding  machines  and  parts  forming  a  part  family  are 
removed  from  the  machine-part  incidence  matrix.  The  system  does 
not  backtrack  in  the  grouping  process,  i.e.,  once  machine  cell 
is  formed,  the  machines  included  in  the  cell  are  not  considered 
for  further  machine  cells. 

2 . 3  Output  Data 

At  the  end  of  the  grouping  process,  KBGAS  prints  the  following 
data: 


584 


-  Machine  cell  formed. 

-  machine  cell  number 

-  list  of  machines  in  a  machine  cell 

-  part  family  number 

-  list  of  part  number  in  a  part  family 

-  Part  waiting  list. 

-  List  of  machines  not  used. 

-  List  of  bottleneck  machines. 

-  Maximum  number  of  machines  in  a  cell. 


3.  DATA  BASE 

The  data  base  stores  the  basic  facts  or  declarative  knowledge 
currently  known  about  problem  domain  in  the  form  of  objects  and 
frames.  Such  data  are  obtained  from  the  user  in  an  Interactive 
mode.  The  user  is  required  to  enter  such  data  as  follows: 

-  Machine  frame:  Machine  frame  contains  information  regarding 
end  machine 

-  Part  frame:  The  part  frame  contains  Information  regarding 
each  part 

-  Matrix-t  (machine-part  incidence  matrix):  The  machine-part 
incidence  matrix  is  constructed  by  the  system  based  on  the 
input  data 

-  Current  machine. 

-  List  of  candidate  machines. 

-  List  of  temporary  candidate  machines. 

-  Part  waiting  list. 

-  List  of  bottleneck  machines. 

-  List  of  temporary  bottleneck  machines. 

-  List  of  machines  not  used. 

-  MC-k  (machine  cell  k) . 

-  PF-k  (part  family  k) . 


4.  KNOWLEDGE  BASE 

The  knowledge  base  stores  the  domain- spec if led  procedural 
knowledge  need  to  solve  problems  coded  in  the  form  of  production 
rules.  The  knowledge  base  consists  of  six  kinds  of  production 
rules : 

1.  Preprocessing  rules  which  deal  with  the  initialization  of 
objects  in  the  data  base  that  are  not  provided  by  the  user. 

2.  Current  part  rules  which  deal  with  the  current  parts  being 
included  in  part  family,  for  example,  whether  a  current 
part  should  be  placed  in  the  part  waiting  list. 

3.  Current  machine  rules  which  deal  with  the  procedure  from 
selecting  candidate  machine  to  selecting  current  machine, 
selected  machine  can  be  put  in  the  candidate  machine  list 
and  a  candidate  machine  can  selected  as  current  machine. 

4 .  Machine  part  rules  which  check  the  appropriateness  of  a 
current  machine  to  the  machine  cell  being  formed,  for 
example,  weather  the  current  machine  is  a  bottleneck 
machine . 


585 


5.  Machine  cell  rules  which  deal  with  each  machine  cell  that 
has  been  formed,  “achine  cell  rules  check  for  violation  of 
constraints  and  remove  parts  violating  them. 

6.  Material  handling  selection  rules  which  deal  with  the 
selection  of  material  handling  carriers  for  a  formed 
machine  cell. 

Each  rules  has  the  following  format: 

Rule  number  (  IF  conditions  THEN  actions) 


5.  THE  INFERENCE  ENGINE 

The  inference  engine  examines  facts  and  implements  the  rules 
stored  in  the  knowledge  base  in  accordance  with  logical  Inference 
and  control  procedure.  In  this  system,  a  forward-chaining  control 
strategy  is  employed.  In  a  given  class  of  rules  it  attempts  to 
fire  all  the  rules  that  are  related  to  the  context  considered. 
If  a  rule  are  triggered,  i.e.,  the  conditions  are  true,  then  the 
actions  of  the  triggered  rule  are  carried  out.  However  some  rules 
stop  the  search  of  the  knowledge  base  and  send  a  message  to  the 
algorithm. 


6.  THE  USER  INTERFACE 

The  user  interface,  representing  the  interaction  between  the  user 
and  knowledge-based  system,  provides  opportunities  for  the  user 
to  monitor  the  performance  of  the  system,  volunteer  information, 
request  explanations,  and  redirect  the  problem-solving  approach. 


7.  MODEL  AND  ALGORITHM  BASE 

The  clustering  algorithm  developed  by  Kusiak  (6)  is  an  extension 
of  the  cluster  Identification  algorithm. ( 1 )  This  algorithm  is 
stored  in  the  model  and  algorithm  base.  It  is  used  to  formulate 
the  group  analysis  problem.  The  specific  model  formulated  depends 
upon  the  nature  of  the  problem,  which  is  reflected  in  the  data 
provided  by  the  user. 


8.  IMPLEMENTATION  OF  KBGAS 
8.1  Illustrative  example 

Give  the  machine-part  incidence  matrix  shown  in  Eq.  1,  vector  fa 
(frequency  of  AGV  trips  required  for  handling  each  part),  vector 
fr  (frequency  of  robot  trips  required  to  handle  each  part),  max 
fa  =  40  (maximum  frequency  of  trips  that  can  be  handled  by  an 
AGV) ,  max  fr  =  100  (maximum  frequency  of  trips  that  can  be 
handled  by  robot),  and  vector  T  (the  column  outside  of  matrix  in 
Eq.  1),  solve  the  group  technology  problem.  The  maximum  number 
of  machines  in  a  machine  cell  is  3. 

fa  [11  30  2.5  -  6  10  -  6  7  15  18  14]  max-fa  (40) 

fr  [11  30  5  3  6  15  10  12  7  -  36  28]  max-fr  (100) 


586 


Pazt 

nimbez 


machine 

numbez 


5 

6 
7 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

0 

4 

0 

21 

0 

0 

0 

8 

0 

0 

0 

o' 

40 

26 

0 

5 

0 

0 

10 

0 

0 

0 

0 

0 

0 

40 

0 

0 

20 

0 

10 

0 

0 

0 

0 

22 

0 

8 

40 

0 

35 

0 

0 

0 

0 

2 

6 

0 

0 

0 

0 

50 

5 

0 

0 

0 

0 

6 

0 

0 

25 

0 

0 

0 

50 

0 

16 

0 

10 

0 

0 

3 

0 

0 

0 

18 

0 

60 

0 

0 

0 

0 

1 

0 

0 

0 

0 

7 

0 

7. 

50. 

The  result  is  shown  in  matrix  2: 


Pazt 

numbez 


2 

5 

^  .  3 

rr>9  -T/JJ  nA 

numbez  ^ 

4 

6 


26 

5 

0 

0 

0 

0 

0 


6  9  5  10  12  2  47 

10  0  0  000  00 

6  25  0  000  00 

0  0  10  22  8  0  00 
0  0  1770  00 

0  0  0  004  21  0 

0  0  0  00  35  02 

0  0  0  0  0  16  10  3 


8 

0 

0 

0 

0 

8 

6 

0 


11  3 

0  5 
0  0 
0  20 
0  0 
0  0 
0  0 
18  0 


(1) 


(2) 


Three  machine  cells,  MC_1  =  {2,5>,  Mc_2  =  {3,7},  and  MC_3  = 
{1,4,6},  and  the  three  corresponding  part  families,  PF-1 
={1,6,9},  PF-2  =  {5,10,12},  and  PF-3  =  {2,4,7,8,11}  have  been 
generated.  Part  3  has  been  assigned  to  a  functional  manufacturing 
facility.  Two  AGVs  and  a  handling  robot  tend  the  three  machine 
cells . 


8 . 2  Implementation 

The  Knowledge-Based  Group  Analysis  System  (KBGAS)  is  developed 
by  using  of  XPLAIN  programming  tool.  XPLAIN  is  a  software 
development  tool  available  on  UNIX  workstations  using  X-window 
system.  This  tool  has  been  successfully  used  to  develop  a  welding 
expert  system  in  SINTEF-NTH.  XPLAIN  combines  features  from 
advanced  User  Interface  Management  System  (UIMS),  expert  system, 
spreadsheet  system,  sketching  tool  and  database  system  (see  Fig. 
2).  The  layout  description  of  the  user  interface  is  done  by 
sketching  interactive  form  and  the  functionality  is  defined  in 
expert  system  like  rules.  XPLAIN  is  not  code  generating  system 
nor  en  interactive  system.  When  your  XPLAIN  application  is 
running,  you  may  step  into  the  rule  editor,  change  rules  and  add 
rules.  Then  you  may  step  back  into  your  running  application  just 
by  pushing  a  function  button. 

The  system  operates  through  a  pull-down  menu,  by  which  the  user 
can  move  between  using  cursor  keys,  and  then  activate  the  desired 
module.  Figure  3  shows  the  main  menu,  which  contains  11  items. 


587 


Figure  2.  XPLAIN  joins  feature  from  different  types  of 
programming  tools. 


The  names  of  the  item  are  self-explanatory.  If  MANUAL  INPUT  item 
is  selected,  the  input  data  model  is  activated  and  Interacted 
with.  Figure  4  shows  a  sub-manu,  which  further  guides  the  user 
to  input  required  information,  such  as,  number  of  machines, 
number  of  parts,  maximum  process  time  for  machines,  machine-part 
matrix,  ...  etc.  Then  EXECUTE  item  can  be  selected,  it  handles 
the  input  information,  interrogates  the  knowledge  base  and  models 
and  algorithms  base,  starts  Inference  engine  and  executes  cluster 
algorithm.  Dependent  on  the  items  in  the  main  menu,  the  output 
can  be  obtained  in  three  different  forms:  (1)  DATA  OUTPUT  which 
is  shown  in  Figure  5.  (2)  MATRIX  FORM  which  is  displayed  in 
Figure  6.  and  (3)  PHYSICAL  LAYOUT  which  is  shown  in  Figure  7. 

As  shown  in  Figure  5,  for  the  given  example  three  machine  cells 
and  three  part  families  have  been  developed.  MC-1  is  served  by 
an  AGV,  MC-2  is  served  by  a  robot,  and  MC-3  can  be  served  by  a 
robot  or  an  AGV.  The  overlapping  part  3  is  placed  on  the  part 
waiting  list.  The  computation  was  performed  for  the  maximum  cell 
size  equal  3. 


9.  CONCLUSIONS 

In  this  paper  a  general  formulation  of  the  group  technology 
problem  in  an  automated  manufacturing  system  was  presented.  The 
formulation  involves  a  matrix  of  processing  times  and  three 
constraints  related  to  the  availability  of  processing  time  at 
each  machine,  requirement  for  material  handling  carriers,  and  the 
maximum  number  of  machines  allowed  per  machine  cell.  To  solve  the 
grouping  problem,  a  knowledge-based  system  (KBGAS)  was  developed 
by  using  of  XPLAIN  programming  tool.  The  KBGAS  Integrated  a 
heuristic  algorithm  with  a  knowledge-based  system. 

There  a  lot  of  models  and  algorithms  in  manufacturing  engineering 


588 


which  have  been  developed  and  employed.  Knowledge-based  systems 
have  received  considerable  attention.  To  integrate  these  models 
and  algorithms  with  a  knowledge-based  system  is  an  important 
research  area  in  flexible  automated,  computer-integrated 
manufacturing.  The  system  presented  demonstrates  an  approach  to 
integrate  some  models  and  algorithms  with  knowledge-based  system 
by  using  of  XPLAIN  programming  tool. 


I  )  r  1  t  f  I  I  t*  I'T  P  '•'T*  i>  I-  I- 


Croup  Anah-sit  in  GT 


£nw»k>-|  _  p[  D«uO»t»ui~ &._u  J 


Figure  3.  The  main  menu. 


NuBb«o(Mchin0(nu  I 

Nuatero(Pv((aatD  >3 

M«  prqcouiBf  to  aKhi.'T'No  *  2  S  ^  5  ®  ^ 

40  40  40  90  SO  60  2P 


Numbv :  \ 

M  (t)  0  i  0  21 

A  ^  ra  26  0  5  0 

u  (3)  0  0  20  0 

uM  W  0  IS  0  0 

H  B  (5)  5  0  0  0 

•  £  (6)  0  16  0  10 

N  (7)  0  0  0  0 


•  7  f  ♦  10  11^  12^ 
0  0  6  0  0  0  0 
10  0  0  0  0  0  0 
0  0  0  0  22  0  • 
0  2  6  0  0  0  0 
6  0  0  2S  0  0  0 
0  2  0  0  0  It  0 
0  0  0  0  7  0  7 


0 

0 

10 

0 

0 

0 


Erof 

Mu  Freoufncyof  ACV  . 

40 

Mu 

too 

Trisi  at  Btfcf  Itl  cvrtcn  for 

Pat  No  1  2  1 

4 

5 

6 

7  6  9  to 

11 

AGV  11  30  5 

0 

6 

10 

0  6  7  15 

11 

RtPW  11  30  S 

3 

6 

15 

10  12  T  0 

36 

MKtuncCcIlScf 
tCM  Value 

3 

25 

1 

1  -  Execute 

3 

Part  Wiiixg  List  <Y.W)’ 

Y 

Figure  4.  The  menu  for  input  matrix. 


539 


MC  Liyoui 


Machine  Cell  (1) 

M3M7 

Part  Family  (1): 

P5  P10P12 

FaJIT  Nt’aefX 

MHS  AKemative: 

(AGV) 

1  M  12  2 

4 

t 

it  <  1  4 

f  s 

^  „  1  I*  Z2  • 

a 

Michine  CeU  (2) 

M6MIM4 

c  *  »« 

mM  . 

t9 

B 

i 

0 

10  0 

•  • 

Part  Family  (2); 

P2  P-4  P7  Pll  P* 

1  ft  A  u 

» 

2 

«  4 

MHS  Altemauve; 

(Robot) 

w*  * 

■ 

24  to 

0  * 

Machine  Cell  (3} 

MS  M2 

Put  Fima>  (3y. 

P1P6  P9 

f wt  •  rt 

MHS  Alternative: 

(Robot  or  AGV) 

ft*a:«.v«aMMaiM*;.S» 

M«  peMoma  Iv 

1 

2  1  4  S 

4  r 

• 

40  4B  a  s 

40  20 

Ti  *1  cmrtn  M 

Part  Waiting  Lin 

P3 

FatMa  1  a  »  A 
ACV  II  »  s  • 

t 

• 

4 

10 

7  1  0  le 

0  •  7  IS 

It  t2 

II  U 

Bottle  Nech  Machines :  No 

a«M  II  »  s  s 

a 

u 

to  12  r  0 

20  a 

Number  of  fflschinei  :  7 

Numbero/paru 

:  12 

M«  >■<«»•  «*C*u  S 

Machine  cell  size 

3 

era  VMS  S5 

1 

Figure  5.  The  menu  for  data  Figure  6.  The  menu  for  output 
output  in  matrix  form. 


Figure  7.  the  menu  for  the  physical  layout. 


590 


10.  REFERENCES 

1.  Anderberg,  M.  D.  (1973).  Cluster  analysis  for  application. 
Academic  Press,  New  York. 

2.  Bratli,  A.,  Engh,  E.  and  Rdstadsand,  P.  A.,  The  use  of 
computer  system  for  generation  of  welding  procedures  and 
quality  control  in  fabrication  of  offshore  structures  and 
pipework.  Proceedings  of  EUROJIN  1  (First  European 
Conference  on  Joining  Technology),  Strasbourg,  France,  pp. 
103-112,  1991. 

3.  Brooks  R.  A.,  Programming  in  common  LISP,  John  wiley  & 
sons.  New  York,  1985. 

4.  Burbidge,  J.L.,  Production  Flow  analysis,  Clarenden  Press, 
Oxford,  1989. 

5.  Engh,  E.,  SIMWELD  -  a  system  for  cost  simulation. 
Proceeding  of  EUROJIN  1,  Strasbourg,  France,  pp.  35-41, 
1991. 

6.  Falster,  P.,  Structural  techniques  for  the  design  of 
production  systems.  Advances  in  production  management 
system,  North-Holland,  pp  23-42,  1986. 

7.  Harhalakls  G.,  loannis  M.  and  Rakesh  N.  Development  and 
application  of  knowledge  based  system  for  cellar 
manufacturing.  Proceedings  of  the  third  international 
conference  expert  systems  and  leading  edge  in  production 
and  operations  management,  Columbia,  USA,  pp.  343-355, 
1989. 

8.  Kusiak  A.,  A  knowledge  based  system  for  group  technology. 
International  journal  of  production  research,  vol .  26,  no. 

5,  pp.  887-904. 

9.  Lasslla,  O. ,  Knowledge-based  algorithm  for  group  analysis. 
Advances  in  production  management  systems,  Elsevier  science 
publishers  B.  V.  (North-Holland),  IFIP,  pp.  507-513,  1991. 

10.  Mykiebust,  o..  Knowledge-based  process  planning  with  object 
oriented  implementation.  Complex  machining  and  AI -methods, 
NORTH_HOLLAND,  pp.  49-58,  1991. 

11.  Rajagopalan  R.  and  Batra  J.  L.,  Design  of  cellular 
production  systems  -  A  graph-theoretic  approach. 
International  journal  of  production  research,  vol.  13,  no. 

6,  pp.  567-579,  1975. 

12.  Romstad,  A.,  User  requirement  specification  for  the  process 
planning  supervisor,  version  2,  SINTEF,  1990. 

13.  Shambu,  G,  Ramaswamy,  R.  and  Rao  H.  R.,  A  rule-based  system 
for  scheduling  in  a  hybrid  group  technology  environment. 
Proceedings  of  the  third  international  conference  expert 
systems  and  leading  edge  in  production  and  operations 
management,  Columbia,  USA,  pp.  357-367,  1989. 

14.  Spur,  G.  and  Specht,  D,  "Knowledge-based  diagnosis  in 
manufacturing  systems".  Manufacturing  Systems,  Vol.  19,  No. 
2,  1990. 

15.  Wold,  P.,  Experience  from  development  and  implementation  of 
QCWELD  -  A  computer-aided  NDT  planning,  documentation  and 
inspection  system,  Proceedings  of  EUROJIN  1,  Strasbourg, 
France,  pp.  215-226,  1991. 


CENTER:  A  System  Architecture  for  Matching  Design  and 

Manufacturing 

Bei-Tseng  Bill  Chu  and  He  Du 
Department  of  Computer  Science 
University  of  North  Carolina  at  Charlotte 
Charlotte,  NC  28223  (billchu(a)unccvax. uncc.edu) 

ABSTRACT 

This  paper  presents  CENTER,  a  new  architecture  for  design  for  manufacturing. 
CENTER  is  based  on  a  new  methodology  for  matching  product  design  with  manufac¬ 
turing  processes.  The  power  of  CENTER  derives  from  a  new  technique  to  analyze 
manufacturing  data  and  use  knowledge-based  techniques  to  guide  design  simulations 
based  on  the  result  of  the  analysis.  CENTER  can  be  used  to  construct  knowledge- 
based  systems  that  would  compliment  the  prevailing  practice  of  exclusively  relying 
on  human-directed  design  simulations.  We  present  two  case  studies:  application  of 
CENTER  to  a  welding  process  and  application  of  CENTER  to  VLSI  manufacturing. 


1.  Design  for  manufacturing 

Matching  product  design  with  the  manufacturing  process  is  the  key  to  the  suc¬ 
cess  of  statistical  process  control.  To  understand  the  this  concept,  let’s  first  consider  a 
simplified  view  illustrated  in  Figure  1.  Suppose  the  quality  of  a  product  is  determined 
by  parameters  x  and  y.  The  box  (also  referred  to  as  the  process  window)  in  figure  la 
shows  the  ability  to  control  these  parameter  in  actual  manufacturing;  that  is,  that  the 
vast  majority  (e.g.  99.9999998%,  or  6-0)  products  can  be  made  with  x  and  y  within 
the  box.  The  design  engineer  must  then  make  sure  that  the  design  would  work  with 
any  parameters  in  the  process  window.  Determining  the  area  in  the  parametric  space 
where  a  particular  design  would  work,  indicated  by  a  dotted  boundary  in  figure  1 ,  is 
typically  difficult.  Simulation  is  the  principle  tool  used  to  verify  designs.  It  is  impossi¬ 
ble  to  simulate  every  point  within  the  process  window.  One  typically  simulates  the 
“worst  case  scenarios”  and  hoping  that  the  area  in  the  parametric  space  where  the 
design  would  work  is  larger  than  the  process  window  (as  illustrated  in  Figure  la). 

If  the  area  in  the  parametric  space  where  a  design  would  work  is  larger  than  the 
process  window,  we  then  say  that  design  and  manufacturing  are  centered,  as  illus¬ 
trated  by  Figure  la.  In  such  a  case  the  manufacturing  process  would  produce  very 
few  defective  parts  (3  defective  parts  per  million  if  a  process  is  under  6-0  control). 
On  the  other  hand,  figure  lb  shows  a  case  where  the  design  and  manufacturing  are 
not  centered.  In  this  case,  even  the  vast  majority  of  parts  made  are  within  the  process 
window  (box),  many  of  them  will  still  fail  because  of  the  design. 

In  this  paper  we  define  design  sensitivity  as  the  different  between  the  process 
window  and  the  area  where  the  design  actually  works.  Figure  la  has  no  design  sensi¬ 
tivity,  the  left  hand  comers  of  the  box  in  figure  lb  are  design  sensitivities.  Even  with 
the  great  care  exercised  by  design  engineers,  design  sensitivities  do  occur.  Its  pre¬ 
valence  in  integrated  circuit  industry  is  well  documented  [9).  It  also  occurs  in  other 

'This  research  is  supported  by  a  grant  from  the  National  Science  Foundation 
MIP-9017I51 


592 


processes  such  as  welding.  Discovering  design  sensitivities  is  a  very  difficult  prob¬ 
lem,  especially  in  cases  where  a  non-negligible  amount  of  random  defects  exist.  The 
CENTER  architecture  presented  in  this  paper  is  designed  to  construct  intelligent 
assistants  in  helping  identifying  design  sensitivities. 

2.  The  CENTER  system  architecture 

The  CENTER  architecture  is  based  on  the-  idea  that  feed  back  from  manufactur¬ 
ing  data  can  be  used  to  intelligently  guide  simulations  to  center  design  and  manufac¬ 
turing.  Suppose  a  part  is  made  with  parameters  and  within  the  process  win¬ 
dow,  and  the  part  is  judged  to  be  defective.  One  of  the  following  two  must  be  true. 
First,  this  is  due  to  design  sensitivity,  or  the  design  will  always  fail  when  parameters 
take  values  Xq  and  (such  as  one  of  the  left  comers  of  the  box  in  figure  lb). 
Second,  the  failure  is  due  to  some  uncontrolled  factor(s),  or  random  defect.  A  statisti¬ 
cal  model  has  been  developed  [2]  to  distinguish  the  two  possible  cases.  The  goal  of 
CENTER  is  first  to  use  such  a  statistical  model  to  hypothesize  design  sensitivities  (i.e. 
occurrences  with  high  probability  of  being  design  sensitivities)  and  then  use 
knowledge-based  techniques  to  target  specific  simulation  runs  to  verify  such 
hypotheses.  Design  engineers  would  benefit  from  CENTER  by  learning  unexpected 
design  sensitivities. 

Figure  2  illustrates  the  CENTER  architecture.  This  architecture  is  designed  to 
address  the  common  need  to  center  design  specification  in  manufacturing  process 
windows.  The  goal  is  to  isolate  those  modules  that  can  be  shared  across  manufactur¬ 
ing  domains  and  provide  an  architecture  in  which  such  generic  modules  can  woiic 
with  other  domain  specific  modules.  The  generic  module  in  the  CENTER  architecture 
is  the  design-sensitivity  hypothesis  module  formulated  in  the  next  section  of  this 
paper.  Hypotheses  proposed  by  the  design-sensitivity  hypothesis  module  is  verified 
by  a  domain  specific  knowledge-based  system.  If  the  hypothesis  is  rejected,  the 
hypothesis  module  may  reformulate  its  hypothesis.  Thus  CENTER  shares  the  princi¬ 
ple  of  hypothesize-and-testing  with  many  intelligent  systems  [8]. 

3.  Hypothesis  of  design  sensitivities 

Throughout  the  rest  of  paper  the  term  parametric  space  refers  to  a  (multi¬ 
dimensional)  region  where  manufactured  parts  will  take  values  from.  In  other  words  a 
parametric  space  represents  the  capability  of  the  manufacturing  process.  The 
approach  we  take  can  be  stated  as: 

If  one  has  observed  that  there  is  a  region  in  the  parametric  space  containing  failed 
parts  only,  and  the  probability  of  this  event  occurring  due  to  random  defects  is  very 
small,  then  one  can  hypothesize  a  design  sensitivity  in  this  region  of  the  parametric 
space. 


This  strategy  can  be  achieved  in  two  steps.  First,  we  look  for  a  region  in 
parametric  space  containing  failed  parts  only.  Second,  we  construct  a  probability 
model  for  random  defects  and  perform  a  statistical  hypothesis  testing. 

3.1.  Hypothesizing  a  region  in  the  parametric  space 

There  are  many  ways  one  can  define  the  boundary  of  a  group  of  failed  dies  in  a 
d-dimensional  parametric  space.  Our  aim  is  concentrated  on  looking  for  a  region 
suspected  of  having  a  design  sensitiv.iy;  expert  investigation  will  be  r  lied  upon  to 


593 


determine  the  root  cause  of  the  design  sensitivity.  Therefore,  using  Occam’s  Razor, 
we  are  only  concerned  with  convex  regions,  where  each  surface  of  the  boundary 
represent  some  linear  combination  of  parameters  that  may  cause  design  sensitivity. 
The  simpliest  algorithm  would  be  the  Convex  Hull  method  [4].  However,  because  we 
are  dealing  with  many  points  in  a  high  dimensional  space,  the  complexity  of  the  algo¬ 
rithm  is  too  great  to  be  practical. 

We  use  an  alternative  greedy  algorithm  using  a  series  of  linear  discriminant 
surfaces  [3].  Too  start  with,  we  have  a  d-dimensional  hypercube  representing  the  pro¬ 
cess  window.  Figure  3(a)  shows  a  cube  rejn-esenting  the  specification  for  three  param¬ 
eters,  failed  parts  are  depicted  as  dark  dots.  The  simpliest  case  involves  finding  a 
linear  surface  separating  the  passed  parts  ftom  the  failed  dies  as  indicated  in  figure 
3(b).  This  surface  along  with  the  the  boundary  surfaces  of  the  parametric  space 
specification  forms  the  convex  region  of  interest. 


More  generally,  the  method  of  Two-Category  Linear  Discriminant  is  to  find  a 
Decision  Surface  that  cuts  a  given  space  in  two  parts.  The  Decision  Surface  can  be 
represented  as  g(x)=0,  where  x  is  a  vector  of  variables  of  the  parametric  space,  g(x) 
is  a  linear  discriminant  function: 

g(x)=w‘x-bWo  (1) 

where  w  is  vector  of  constants  called  the  weight  vector  and  Wq  is  the  threshold 
weight.  The  two  partia!  spaces  separated  by  g(x)=0  are  g(x)>0  and  g(x)<0. 

Typically,  we  ciuinot  completely  divide  all  failed  and  passed  dies  using  one 
discriminate  surface.  However,  a  series  of  such  surfaces  will  define  a  convex  region 
enclosing  a  group  of  failed  die. 


then 


More  specifically,  let 


Y= 


1 

Wo 

Xl 

Wj 

. 

II 

, 

Xd 

Wd 

L  J 

Wo] 

w 


g(x)=A‘Y  (2) 


Consider  n  parts  with  position  vectors  Yi  Y2,  •  •  • ,  Yn  in  the  parametric  space. 
We  would  like  to  construct  a  surface  g(x)  minimizing  the  perceptron  criterion  func¬ 
tion  [3].  A  succession  of  such  linear  surfaces  will  define  a  convex  hull  with  failed 
parts  only. 


3,2.  Differentiating  random  defects  and  design  sensitivity 

With  enough  manufacturing  data,  one  expects  that  parts  failed  due  to  design 
sensitivities  will  be  localized  to  certain  areas  of  the  parametric  space  while  parts 
failed  due  to  random  defects  will  permeate  the  entire  parametric  space.  However  one 
would  like  to  detect  design  sensitivities  early  in  the  development  process,  the  chal¬ 
lenge  is  to  find  a  probability  model  that  would  lead  to  a  statistical  test  to  differential 
random  defects  from  design  sensitivities  using  as  small  amount  of  data  as  possible. 

We  use  the  following  formulation  to  describe  a  generic  manufacturing  concern. 
Suppose  that  parts  are  manufactured  in  lots.  Each  lot  contains  W  slots.  Each  slot  con¬ 
sists  of  K  positions.  Each  petition  contains  a  part  we  ate  manufacturing.  This 


594 


characterization  can  be  used  to  model  both  batched  as  well  as  discrete  manufacturing 
processes.  In  a  discrete  manufacturing  case  where  one  part  is  manufactured  at  a  time, 
W=K=1. 

Suppose  the  parametric  space  is  d-dimensional,  and  that  L  lots  have  been 
manufactured.  Values  for  all  d  parameters  of  all  T=LxWxK  parts  have  been  meas¬ 
ured.  Each  part  has  been  tested  fcM"  pass/failure.  Suppose  that  F  of  the  total  T  parts 
failed. 

Each  manufactured  part  can  be  regarded  as  a  point  in  the  d-dimensional  space. 
Suppose  a  convex  region  X  with  N  defective  parts  is  found  in  the  parametric  space. 
Let  the  probability  of  X  occurring  due  to  a  design  sensitivity  be  denoted  as  pjj.  Let 
Pr=l-Pds  be  the  probability  of  X  occurring  due  to  random  defects. 

We  now  demonstrate  how  Pf  can  be  estimated.  Suppose  we  take  the  position 
that  there  are  no  design  sensitivities,  and  the  failure  of  parts  are  independent.  Then 
the  probability  for  having  N  failed  parts  in  region  X  is 

Px=(|-)^  (3). 

Since  there  are  a  total  of  T  parts,  the  problem  becomes  a  binomial  distribution: 
we  have  —  regions;  the  probability  for  each  region  to  contain  all  failed  parts  is  px; 

the  probability  of  finding  at  least  on  region  with  all  bad  parts  due  to  random  defect  is: 

Pr=l-(1-Px)™  (4). 

However  the  real-world  cases  are  more  complicated  in  that  the  failure  of  parts 
may  not  be  independent.  In  the  most  general  case  we  expect  three  types  of  dependen¬ 
cies.  Lot  dependency  refers  to  cases  where  parts  made  within  the  same  lot  tend  to  fail 
together.  Slot  dependency  refers  to  cases  where  parts  made  within  the  same  slot  tend 
to  fail  together.  Cluster  dependency  refers  to  cases  that  parts  made  within  the  vicinity 
of  each  other  in  the  same  slot  tend  to  fail  together.  Cluster  dependency  differs  from 
slot  dependency  in  that  parts  made  within  the  same  slot  but  sufficiently  apart  from 
each  other  may  still  fail  independently.  The  idea  is  to  find  the  largest  dependent  group 
(lot,  slot,  or  cluster)  based  on  observed  data.  Assume  that  failure  within  a  group  is 
dependent  and  failure  across  groups  are  independent,  or 

P^<,£,INAJ  ,5), 

Combining  (4)  and  (5)  we  have: 

F  if  J 
P,<1-<My)  "  ) 

To  determine  the  largest  dependent  group,  one  starts  by  looking  for  evidence  of 
lot-dependent  defects  (this  step  is  not  necessary  if  the  lot  size  is  one).  We  assume  that 
in  the  absence  of  lot-dependent  drfects,  the  success  rate  for  each  lot  follows  a  normal 
distribution  (binomial  distribution  if  the  lot  size  is  small).  Well  known  statistical  tests 
[1]  exist  to  verify  whether  observed  data  follow  such  a  distribution.  If  there  is  evi¬ 
dence  for  lot-dependent  defects,  we  would  like  assume  that  Q  equals  to  the  lot  size. 

If,  on  the  other  hand,  we  fail  to  find  evidence  for  lot  dependent  defects,  we  then 
look  for  evidence  of  slot-dependent  defects  (again  skip  this  step  if  slot  size  is  one). 
We  again  base  such  a  test  on  the  assumption  that  in  the  absence  of  slot-dependent 
defects  and  lot  dependent  defects,  slot  success  rates  are  expected  to  follow  a  normal 
distribution  (binomial  distribution  if  slot  size  is  small).  If  evidence  of  slot-dependent 


595 


defects  is  found,  we  take  Q  to  be  the  number  of  parts  per  slot. 

If  evidence  supports  neither  lot-dependent  nor  slot-dependent  defects,  we  have 
to  consider  the  cluster’s  size.  If  the  number  of  parts  per  slot  is  one,  then  Q  is  one. 
Otherwise  cluster  size  varies  with  manufacturing  processes.  Process  dependent 
knowledge  is  needed  to  estimate  Q.  It  is  common  to  use  F  distribution  to  model  clus¬ 
ter  size  [5]. 

4.  CENTER- VLSI:  VLSI  circuit  manufacturing 

This  section  describe  a  prototype  system,  CENTER- VLSI,  we  have  built  for 
identifying  design  sensitivities  in  VLSI  circuit  manufacturing  based  on  the  CENTER 
architecture.  CENTER- VLSI  uses  the  generic  design  sensitivity  identification  module 
of  figure  2.  A  number  of  VLSI  circuits  (parts)  are  fabricated  together  on  a  wafer 
(slot).  Wafers  are  organized  into  lots.  The  selection  of  cluster  size  is  based  on  [5]. 

Figure  4  depicts  a  typical  methodology  to  design  a  VLSI  circuit  A  VLSI  circuit 
typically  involves  over  one  million  transistors.  It  is  impossible  to  simulate  the  entire 
circuit  based  on  electrical  parameters  (e.g.  mobility,  threshold  voltage).  Instead  a 
design  engineer  would  select  a  set  of  electrical  parameters  from  the  parametric  space 
defined  by  the  manufacturing  process.  These  parameter  values  are  used  to  simulate 
basic  logic  elements  (e.g.  gates,  flip  flops)  and  identify  their  delay  characteristics 
(figure  5  shows  some  typical  delay  characteristics  for  a  flip  flop).  Then  the  entire  cir¬ 
cuit  is  simulated  at  logical  level  to  verify  the  design. 

The  dimensionality,  proportional  to  the  sophistication  of  the  technology  used,  is 
typically  very  high  (greater  than  10).  Resource  constraints  dictates  that  one  only 
selects  a  very  small  fraction  of  the  parametric  space,  referred  to  as  the  worst  case 
scenarios,  to  conduct  simulations  according  to  figure  4.  Selecting  such  scenarios  is  an 
art  practiced  by  only  the  most  experienced  engineers.  Even  then,  design  sensitivities 
are  difficult  to  avoid,  especially  for  early  design  versions,  because  it  is  difficult  to 
foresee  all  potential  interactions  among  the  large  number  of  transistors. 

The  CENTER  architecture  is  ideally  suited  to  attack  such  a  problem.  We 
describe  how  a  prototype  system,  CENT^-VLSI,  addresses  this  problem  domain 
under  the  CENTER  architecture.  The  basic  idea  of  CENTER- VLSI  is  to  have  a  pro¬ 
gram  initiate  all  simulations  to  verify  the  design  sensitivity  discovered  by  the  design 
sensitivity  identification  module. 

CENTER- VLSI,  its  organization  shown  in  figure  6,  relies  on  a  simulation  case 
base  acquired  from  the  design  engineer.  The  case  base  is  indexed  by  the  name/version 
of  the  design.  When  an  engineer  performs  a  SPICE  [7]  simulation  (based  on  electrical 
parameters),  with  the  help  of  the  engineer,  this  circuit  input  (in  a  format  readable  by 
SPICE)  is  recorded  in  the  case  base.  Delay  characteristics  extracted  from  the  SPICE 
simulation  is  also  recorded.  Typically  the  design  engineer  uses  an  automated  tool  to 
construct  logic  circuit  design  based  on  simulated  circuit  fragments  and  their  delay 
characteristics.  The  role  of  CENTER- VLSI  is  that  of  a  learning  apprentice  [6]  in  this 
phase  of  design. 

When  manufacturing  data  is  collected  and  a  design  sensitivity  region,  X,  has 
been  hypothesized,  the  simulation  driver  of  CENTER- VLSI  will  randomly  select 
several  points  from  X  and  repeat  the  simulations  recorded  in  the  case  base.  Complete 
logic  simulation  will  also  be  performed  based  on  the  newly  obtained  delay  charac¬ 
teristics  under  the  frequency  where  actual  circuit  failed.  If  the  design  fails  under  a  set 
of  electrical  parameters  hypothesized  by  X,  then  X  would  be  identified  as  a  design 


596 


sensitivity  and  the  result  is  presented  to  the  engineer  for  design  revisions.  Otherwise, 
the  simulation  driver  will  set  all  parts  in  X  as  passed  and  reinvoke  the  design  sensi¬ 
tivity  hypothesis  module. 

Due  to  the  difficulties  in  obtaining  real  design^rocess  data,  we  tested 
CENTER- VLSI  with  a  detailed  simulated  experiment.  The  experiment  has  two  parts. 
In  the  first  part  we  design  a  circuit  with  a  known  design  sensitivity.  In  the  second  part 
we  test  to  see  if  CENTER- VLSI  can  discover  this  “bug”  on  its  own. 

In  the  first  part,  we  follow  the  design  methodology  of  4  but  avoided  selecting 
any  electrical  parameters  from  the  known  design  sensitive  region.  The  circuit  we  use 
is  a  Self-Voltage  Controlled  Oscillator  circuit  (figure  7).  The  reasons  for  selecting  this 
circuit  is  that  circuit’s  frequence  is  very  sensitive  to  device  parameters.  It  is  clear  that 
for  such  a  simple  circuit,  it  is  unlikely  for  a  trained  design  engineer  to  commit  the 
type  of  design  mistake  we  introduce  on  purpose,  nie  idea  being  tested  here  is  th^  the 
same  type  of  design  sensitivities  (e.g.  in  terms  of  the  combination  of  circuit  parame¬ 
ters)  could  have  been  committed  in  a  much  more  complicated  circuit  due  to  unfore¬ 
seen  interactions. 

The  circuit  has  two  components:  an  Oscillating  Route  and  a  Voltage  Refer¬ 
ence.  The  Oscillating  Route  has  31  stage  CMOS  inverters  and  three  voltage  con¬ 
trolled  NMOS  transmission  transistors.  The  Voltage  Reference  is  a  differential 
amplifier  with  a  built-in  voltage  divider  as  its  input  Both  parts  are  sensitive  to  param¬ 
eters,  especially  to  the  threshold  voltages  and  gain  factors  of  two  type  devices. 

In  this  example,  we  selected  the  following  electrical  parameters; 
TOX,VTON,VTOPdCPNJCPP.  TOX  is  gate  oxide  thickness  for  both  NMOS  and 
PMOS  devices;  VTON  and  VTOP  are  NMOS  and  PMOS  threshold  voltage  (vsb=0) 
respectively;  KPN  and  KPP  are  NMOS  and  PMOS  gain  factors  respectively.  The 
assumption  is  that  we  are  building  this  circuit  with  only  one  size  of  transistor.  In  a 
more  realistic  situation,  different  sizes  of  transistors  will  be  used  at  the  same  time  and 
the  number  of  electrical  parameters  will  increase  accordingly. 

We  select  the  MOSIS  process  (a  public  fabrication  process  funded  by  the 
National  Science  Foundaticn)  as  our  reference  manufacturing  process.  We  have 
obtained  a  set  of  approximate  pr&.'.ess  specifications  for  these  five  parameters,  as  well 
as  other  parameters  for  our  simulations. 

We  first  divide  the  specified  ranges  of  these  five  parameters  into  five  intervals. 
Assuming  all  other  parameters  beine  constant,  and  selecting  mid  values  from  each 
divided  intervals,  we  have  total  of  P  (3125)  possible  parametric  combinations.  We 
use  SPICE3e2  to  simulate  the  frequency  perfcMinance  at  all  these  combinations  lead¬ 
ing  to  a  a  frequency  distribution  in  a  five  dimension  parametric  space.  We  define 
35MHz  as  the  lowest  acceptable  frequency  for  this  design,  thus  the  region  in  the 
parametric  space  correspontfing  to  frequencies  under  35MHz  is  a  design  sensitivity. 

We  proceed  to  define  a  simulated  manufacturing  process  that  would  assign 
electrical  parameter  values  to  each  “manufactured”  part  (a  five-place  vector).  How¬ 
ever,  precise  distributions  for  device  parameters  in  real  manufacturing  is  difficult  to 
get.  To  carry  out  our  experiment,  we  made  the  following  simplifying  assumptions.  To 
a  certain  extent,  these  assumption  can  be  justified  based  on  the  statistical  characteris¬ 
tics  of  device  parameter  distributions. 

(1)  The  parameter  average  value  for  a  given  lot  follows  a  Gaussian  distribution. 


(2)  The  parameter  variance  within  a  lot  is  smaller  than  the  variance  among  lots. 
Furthermore,  parameter  variance  of  a  lot  follows  a  Gaussian  distribution. 

(3)  The  parameter  average  value  within  a  wafer  in  a  given  lot  follows  a  Gaussian 
distribution  for  VTON,VTOP,KPN,KPP.  The  mean  value  of  TOX  for  a  wafer 
is  assumed  to  be  uniformly  distributed  within  the  limits  for  a  lot 

(4)  The  parametric  variance  withing  a  wafer  is  smaller  than  that  within  the  lot 
containing  that  wafer.  This  variance  also  follows  a  Gaussian  distribution. 

(5)  Within  a  wafer,  the  distribution  of  parameters  is  linear  for  TOX,  and  concen¬ 
tric  for  VTON,VTOPJCPN,  and  KPP. 

Defects  on  wafers  are  also  generated  randomly.  We  use  a  uniform  distribution 
to  simulate  point  defects.  We  use  a  Gamma  Distribution  model  [5]  to  simulate 
clustered  defects.  To  simulate  wafer-dependent  defects,  we  use  a  uniform  distribution 
to  determine  whether  a  wafe'  is  subject  to  wafer-dependent  drfects  or  not.  If  it  is  sub¬ 
ject  to  such  defects,  we  again  uniformly  determine  dies  on  that  wafer  that  are  subject 
to  such  wafer-dependent  defects.  To  simulate  lot  dependent  drfects,  we  assume  each 
lot  has  equal  probability  of  being  subject  to  lot-dependent  defects.  Once  a  lot  is 
selected  as  being  subject  to  a  lot-dependent  d^ect,  we  random  determine  what  dies 
would  fail  for  a  wafer,  and  assumes  all  wafers  within  the  lot  will  have  the  same 
failure  pattern  due  to  the  lot-dependent  defects. 

A  total  of  30,000  parts  where  “fabricated”  using  our  simulated  process.  The 
total  failure  rate  is  set  at  60%,  a  typical  situation  on  a  pilot  line.  About  300  parts  fall 
into  the  design  sensitive  region.  CENTER- VLSI  is  able  to  correctly  identify  a  subset 
of  the  design  sensitive  region  containing  100  failed  parts. 

5.  CENTER-WELDING:  identifying  design  sensitivity  in  a  welding  process 

To  demonstrate  that  the  CENTER  architecture  can  be  applied  to  different  appli¬ 
cation  domains,  we  illustrate  how  it  can  be  used  in  a  generic  welding  process  The 
main  parameters  of  this  process  are:  the  speed  of  the  work  piece,  the  feeding  speed  of 
the  welding  material,  the  voltage  and  current  applied  by  the  welding  gun.  The  work- 
pieces  are  selectively  X-rayed  for  defect  inspection.  Defects  can  be  caused  by  either 
process  parameters  or  by  other  random  effects  such  as  impurity  of  the  welding 
material.  The  CENTER  architecture  can  be  applied  to  identify  welding  conditions  that 
will  lead  to  defects  in  the  workpiece. 

6.  Suinmai7 

This  report  presents  CENTER  as  a  system  architecture  for  matching  design  and 
manufacturing.  At  this  writing,  main  components  of  CENTER- VLSI  has  been  imple¬ 
mented.  We  are  actively  seeking  opportunities  to  apply  CENTER- VLSI  to  teal 
design/fabrication  data. 

7.  Reference 

(1)  Box,  G,  Hunter,  W,  &  Hunter,  J,  Starijft'cj /or  fjcpcnme/iterj.  New  York,  NY;  Wilely,  1978. 

^  Due  the  proprietary  nature  of  this  process,  we  cannot  present  details  of  our 
experiment. 


598 


(2)  Chu,  B.  &  Du,  H.  Identifying  Design  Sensitivities  Based  on  Fabrication  Data.  Dept,  of  Comp.  Sci. 
Univ.  of  North  Carolina  at  Charlotte,  Technical  Report  92-4-1,  April,  1992. 

(3)  Duda,  R.  &  Hart,  P.  Pattern  Classification  and  Scene  Analysis,  John  Wiley  &  sons,  1973. 

(4)  Edelsbfunner,  H.  Algorithms  in  Combinatorial  Geometry,  Springer- Verlag,  1987. 

(5)  Michalka,  T.,  Varshney,  R.  <Sc  Meindl,  J.  “A  Discussion  of  Yield  Modeling  with  Defect  Clustering, 
Circuit  Repair,  and  Circuit  Redundancy”  in  IEEE  Tran,  on  Semiconductor  Manufacturing,  Vol.  3, 
No.  3.  pp.l  16-127,  August,  1990. 

(6)  Mitchell,  T.,  CarboneU,  J.,  &  Michalski,  R.  (eds.)  Machine  Learning  Boston,  MA;  Kluwer  Academ¬ 
ics  Publishers  1986. 

(7)  Nagle,  L.  “SP1CE2:  A  computer  program  to  sumulate  semiconductor  circuits”  Memo  No.  ERL- 
M520.  University  of  California  at  Berkeley,  1975. 

(8)  Reggia,  J.  Knowledge-Based  Decision  Support  Systems:  Development  Through  KMS,  Dept,  of 
Comp.  Sci.  Univ.  of  Maryland,  Technical  Report,  TR-1 121.  Oct.,  1981. 

(9)  Spence,  R.  &  Soin,  R.  Tolerance  design  of  electronic  circuits  Reading,  MA:Addison-Wesley,  1988. 

(10)  Sze,  S.  VLSI  Technology,  New  York,  NYiMcGraw  Hill,  1982. 


1«  Figure  lb 


Figure  1 .  Matching  design  and  manufacturing 


Verified  design 
sensitivity 


Figure  2.  CENTER  architecture. 


599 


Figure  4.  A  typical  VLSI  design  methodology 


Figure  S  ”D”  Flip-Flop  delay  characteristics 


600 


VsiSed  deagxi 
semiuvity 


Figure  6.  CENTER- VLSI  syuon  diigrim 


Figure  7  31  -suge  CMOS  OsciUelot 


Knowledge-Based  System  Integration  in  a 
Concurrent  Engineering  Environment* 

M.  Sobolewski 

Concurrent  Engineering  Research  Center,  West  Virginia  University,  Morgantown  26'  J6 
E-Mail:  sobol@cerc.wvu.wvnet.edu 


Abstract.  The  systematic  integration  of  humans  with  the  tools,  resources,  and 
information  assets  of  an  organization  is  fimdamental  to  concurrent  engineering 
(CE).  In  an  integrated  environment,  all  entities  must  first  be  connected,  and  they 
then  must  work  cooperatively.  Services  that  support  “concurrency"  —  through 
communication,  team  coordination,  information  sharing,  and  integration  —  in 
an  interactive  and  formerly  serial  product  development  process  provide  the 
foundation  for  a  CE  environment.  Product  developers  working  concurrently  in 
their  application  domains  need  “built-in”  tools  and  “operated-in”  tools  in  a 
computer-aided  CE  envirorunent.  The  latter  group  of  tools  evolves  over  time  and 
requires  continuous  extensions  and  changes  since  each  next  product  development 
is  either  new  or  an  improvement  of  the  previous  one.  Product  developers  need  a 
CE  programming  environment  in  which  they  can  build  programs  from  other 
developed  programs,  built-in  tools,  and  knowledge  bases  describing  how  to 
perform  a  complex  design  process.  This  paper  describes  a  type  of  system 
integration  provided  by  means  of  a  knowledge- based  environment  that 
encompasses  programs,  CAx  tools,  and  knowledge  bases.  The  presented  aprproach 
is  illustrated  by  selected  examples. 

1  Introduction 

System  integration,  many  consider,  is  an  ill-structured  problem  (the  term  ill-structured 
problem  is  used  here  to  denote  a  problem  that  does  not  have  an  explicit,  clearly  defined 
algorithmic  solution).  No  specific  rules  have  to  be  followed  when  doing  integration; 
integration  depends  totally  upon  the  environment  to  be  integrated.  Experienced 
designers  deal  with  system  integration  using  judgement  and  experience.  Knowledge- 
based  programming  technology  offers  a  methodology  to  tackle  these  ill-structured 
integration  and  design  problems.  The  Concurrent  Engineering  Center  (CERC)  has 
developed  such  an  environment,  called  DICEtalk  (Sobolewski,  1990a,b,  1991a,b,c, 
Kulpa  and  Sobolewski,  1992). 

DICEtalk  is  a  knowledge-based  development  system  developed  for  the  DARPA 
Initiative  in  Concurrent  Engineering  (DICE),  implemented  in  Smalltalk  object- 
oriented  environment  as  an  implementation  tool  for  engineering  design,  modeling 
systems,  and  system  integration  (Goldberg  and  Robson  1989).  It  includes  a  knowledge 
definition  and  problem  solving  apparatus  together  with  a  set  of  state-of-the-art  user 
interface  tools.  It  is  capable  of  handling  multiple  interacting  hierarchical  knowledge 

*This  work  has  been  sponsored  by  the  Defense  Advanced  Research  Projects  Agency 
(DARPA),  under  Grant  No.  MDA972-91-J-1022  for  the  DARPA  Initiative  in  Concurrent 
Engineering  (DICE). 


602 


bases  and  integrating  external  (foreign)  programs  with  the  knowledge  base  mechanism. 
It  also  supports  natural  language  knowledge  definition,  fully  menu-driven  user 
interaction,  and  graphical  data  presentation.  More  than  20  know  ledge -based 
engineering  tasks  were  experiment^ly  implemented  using  the  system  and  integrated 
into  a  design  process  model  (Padhy  &  Dwivedi,  1990;  Padhy,  1990;  Sobolewski, 
1990b,  1991a,c;  Saidi,  1991;  Benner  1991;  Chung  1992). 

If  we  assume  that  everyone  in  the  product  development  cycle  will  make  the  best 
decision  from  the  overall  life-cycle  viewpoint,  if  they  are  given  appropriate  advice  at 
the  right  time,  then  we  can  attempt  to  design  a  set  of  knowledge-based  systems  to 
provide  such  advice.  The  knowledge-based  integration  framework  requires  the  capture 
of  expert  knowledge  and  analytical  tools  to  build  the  knowledge-based  environment, 
which  will  support  each  of  product  developers  involved  in  new  product  design, 
development,  prototype,  and  manufacture.  Properly  integrated  knowledge-based 
systems  provide  many  substantive  benefits,  including  an  intuitive  interaction 
paradigm,  transparent  access  and  invocation  of  tools  integrated,  the  capability  to  share 
relevant  data  among  tools  and  services,  and  the  capability  to  combine  tools’ 
capabilities  to  provide  for  compound  transactions. 

We  now  briefly  review  some  terminology.  If  a  particular  set  of  facts  is  known 
about  the  world,  then  this  factual  (declarative)  knowledge  can  be  increased  if  various 
rules  (norms)  are  known.  Factual  knowledge  may  be  derived  using  both  observed  facts 
and  derived  facts  through  a  justified  mode  of  inference.  In  this  simple  characterization, 
the  term  declarative  knowledge  base  is  taken  to  mean  the  collection  of  all  facts  of  the 
world  (domain),  and  the  term  inference  engine  refers  to  programs  that  reason  with 
(execute)  that  declarative  knowledge  base.  An  inference  engine  derives  facts  (output 
facts  —  conclusions,  and  intermediate  facts  —  findings)  from  other  facts,  which 
include  rules,  assumptions,  user  answers,  findings,  etc.  A  collection  of  programs  and 
subroutines  needed  to  compute  new  facts  and  conclusions,  or  used  during  the  execution 
of  a  declarative  knowledge  base  by  a  procedural  attachment  mechanism,  forms  a 
procedural  knowledge  base.  This  procedural  attachment  mechanism  can  provide 
graphics,  windows,  animation,  dictionary  lookup,  or  file  I/O.  Both  the  declarative 
knowledge  base  and  the  procedural  knowledge  base  are  referred  to  as  a  global 
knowledge  base,  or  simply  a  knowledge  base.  A  knowledge  base  requires  an  inference 
method  and  fact  and  go^  representations.  In  knowledge-based  systems,  these  two  parts 
constitute  a  knowledge  representation  partaligm. 

In  the  rest  of  this  paper  we  describe  the  DICEtalk  representation  scheme, 
which  involves  several  language  levels.  We  also  delve  into  the  problem  solving 
aspects  supporting  system  integration.  An  object-oriented  problem  solving  scheme  is 
based  on  a  “dispatching-managing”  model.  Finally,  we  discuss  specific  examples. 

2  DICEtalk  Knowledge  Representation  Scheme 

In  DICEtalk,  a  knowledge  description  scheme  is  based  on  a  surface  language,  an 
intermediate  language,  and  a  deep  language,  with  the  internal  languages  hidden  from 
the  user  (Sobolewski,  1987,  1989a,  1989b,  1991b).  The  surface  language  sentences 
appear  as  simplified  English  sentences  that  allow  engineers  to  create  knowledge  bases 
and  metaknowledge  bases  naturally,  without  requiring  them  to  learn  specialized  data 
description  and  manipulation  languages.  The  expressions  of  intermediate  language  are 
logical  formulas  of  the  formalized  percept  language  for  describing  percepts  and 
metapercepts  (Sobolewski,  i989b,  1991b).  In  this  case,  language  primitives  of  surface 


sentences  and  percept  formulas  are  the  same,  i.e.,  a  subject  of  a  sentence  and  its 
complements,  which  allow  the  conversion  of  natural  sentences  into  percept  formulas 
to  easier  and  more  natural.  The  deep  language  is  Smalltalk-80  used  for 
implementini,  structured  objects  that  represent  logical  (percept  and  metapercept) 
formulas  at  a  computer  level  (Goldberg  and  Robson,  1989). 

The  knowledge  description  language  SPDL  (Surface  Percept  Description 
Language)  is  used  to  express  declarative  (factual)  knowledge  and  metaknowledge  bases, 
as  well  as  goal  knowledge  bases,  whereas  the  programming  language  Smallialk-80  is 
dedicated  to  representing  procedural  knowledge  bases.  SPDL  is  defined  by  44  EBNF 
(Extended  Backus-Naur  Form)  rules.  In  order  to  give  a  general  idea  of  what  SPDL  is, 
we  list  below  12  basic  SPDL  rules: 

1  sentence  =  assumption  I  rule  I  goal  I  initData  I  question. 

2  assumption  =  [entry]  clause. 

3  goal  =  [entry]  clause. 

4  rule  =  [entry]  "IF”  clause  “THEN”  clause. 

7  clause  =  [( "not”]  subject]  complements. 

8  subject  =  path  term  I  inputVariableConstant  path 

9  path  =  attributeName  [ottributeName  I  variableName]. 

10  complements  =  complement  {(","  I  "antf  I  "or”)  complement). 

1 1  complement  =  path  ( “is”  I  “are”  I  "=")  term  [definiteness]  1 

path  [“not”]  decisionAttributeName  [d^niteness] ) 
outputVariableConstant  (“is”  I  “are”  I  “=”]  path  1 
(“if"  I  “whether”) decisionAttributeName inversePathl 
“no"  attribute  I  “has”  attribute  1  predicate. 

14  term  =  literal  [(“aruf  I  “or”)  literal]  I  number  I  interval  I  point  1  vector  I 

date  I  time  I  multivalue  I SmalltalkExpression 

1 5  literal  =  [  “not”  ]  (valueName  I  variableName). 

1 7  predicate  =  "[”  booleanExpression  "]”  I 

“[”  SmalltalkExpression  "for"  variableName  “]”. 

Thete  rules  explain  the  declarative  and  procedural  knowledge-base  iniegration  that  is 
based  on  expressions  included  in  brackets  in  Rules  14  and  17.  Generally  speaking,  a 
clause  is  a  description  of  an  entity  in  terms  of  subjects  and  complements,  as  in  natural 
sentences.  Subjects  represent  main  qualities  and  quantities,  whereas  complements 
represent  complementary  qualities  and  quantities.  Qualities  are  expressed  by  paths  — 
sequences  of  attributes,  and  quantities  by  values,  i.e.,  numbers,  intervals,  points, 
names,  etc.  The  following  example  describe  the  boring  machine  HBM2; 

boring  machine  HBM2: 

table  area  is  1000, 
surface  finish  is  63, 
tolerance  is  0.01 

where  each  quality  is  expressed  by  a  sequence  of  two  attributes  or  one,  the  main 
quantity  is  a  name,  and  complementary  quantities  are  numbers.  Logical  connectives 
(and,  or,  and  not)  are  allowed  to  build  compound  complements  and  values,  including 
variables.  When  a  subject  is  omitted  in  a  clause,  it  means  that  it  is  understood. 

SmalltalkExpression  stands  for  any  sequence  of  statements  (expression  series)  of 
the  Smalltalk  language,  possibly  with  variables,  including  the  special  variable  “kb”. 


604 


which  represents  the  current  knowledge  base  of  the  DICEtalk  system  (a  knowledge 
base  is  an  instance  of  class  KnowledgeBase  or  its  subclass);  booleanExpression 
denotes  SmalltalkExpression  which  evaluates  to  true  or  false  (boolean  objects).  These 
two  forms  of  Smallt^  expressions  provide  a  procedural  attachment  mechanism.  This 
mechanism  is  especially  essential  to  engineering  design  tasks,  in  which  many  pieces 
are  given  in  the  form  of  formulas  and  calculations,  and  are  most  effectively  carried  out 
by  appropriate  procedures  or  external  programs. 

3  Problem-Solving 

A  problem-solving  model  is  a  scheme  for  organizing  reasorang  steps  and  domain 
knowledge  to  construct  a  solution  to  a  problem.  The  central  issue  of  problem-solving 
deals  with  the  question:  What  pieces  of  knowledge  should  be  applied,  and  where  and 
how  should  they  be  applied?  A  problem  solving  model  provides  both  a  conceptual 
framework  for  organizing  knowl^ge  and  a  suategy  for  applying  that  knowledge.  The 
DICEtalk  problem-solving  model  is  object-oriented  and  ba^  on  a  dispatch-managing 
problem-solving  model.  We  can  view  such  a  dispatch-managing  model  as  a  natural 
study  of  how  a  group  of  individual  solvers  can  combine  to  solve  a  goal  (problem). 
The  presented  approach  is  to  split  the  goal  into  simpler  tasks  and  to  solve  each  of 
these  tasks  by  a  dispatch-managing  module  (DM  module).  A  dispatch-managing 
module  consists  of  a  task  dispatcher  and  its  manager.  We  suppose  that  tasks  are  not 
independent,  i.e.,  they  interrelate  in  some  way. 

The  dispatch-managing  model  deals  with  problem-solving  by  separating  a  goal 
into  a  hierarchical  structure  of  subordinate  tasks  solved  by  DM  modules.  A  dispatch¬ 
managing  architecture  of  the  problem-solving  engine  consists  thus  of  five  basic 
components: 

1 .  The  knowledge  (five  panels): 

The  main  repository  of  goals,  facts,  procedures,  and  control  advices. 

2.  The  supervisor  (master  panel): 

The  master  DM  module  deals  with  solving  the  user  defined  goal.  It  creates  the  top 
DM  module  and  controls  a  DM  network  activity. 

3.  The  DM  network  (dispatcher  and  manager  panels); 

The  problem-solving  tasks  are  organized  into  the  hierarchical  structure  related  to 
the  current  state  of  goal-solving.  Each  local  DM  module  deals  with  local  task¬ 
solving,  according  to  its  local  control  strategy  and  a  local  knowledge-base  taken 
as  a  subtask  perspective  of  the  knowledge  base.  Managers  of  local  DM  modules 
are  responsible  to  their  dispatchers  for  local  strategy.  Managers  decide  what 
actions  to  take  next  for  their  dispatchers.  Communication  and  interaction  among 
parent  DM  modules  and  their  child  modules  take  place  through  their  dispatchers. 

4.  The  working  memory  (data  panel); 

Subtasks  are  creat^  by  dispatchers  and  are  scheduled  to  be  solved  by  their 
managers  according  to  control  advices.  DM  modules  produce  changes  in  the 
working  memory  that  lead  incrementally  to  a  global  solution  as  a  unification  of 
all  local  solutions.  Parameters  are  user-defined  characteristics  evaluated  by 
dispatchers  and  then  used  as  arguments  of  procedure  calls. 

5 .  The  results  (inference  panel): 


605 


Problem  solving  results  (answers,  findings,  and  conclusions)  are  created  by  the 
supervisor  and  dispatchers  and  can  be  transferred  to  other  knowledge  bases  as  the 
part  of  their  descriptive  panels  for  distributed  problem  solving. 

These  five  components  form  a  DICEtalk  global  knowledge  base  implemented  by  the 
highly  structure  class  GlobalKnowledgeBase.  An  instance  of  this  class  can  be 
considered  a  kind  of  object-oriented  distributed  blackboard  (Jagannathan,  1989). 
DICEtalk  can  contain  many  concurrent  global  knowledge  bases  during  complex 
cooperative  problem  solving,  as  shown  in  Figure  1.  All  DICEtalk  agents  have  the 
same  structure  and  can  request  and  provide  services  to  and  from  each  other  (peer-to-peer 
processing).  A  superagent  is  an  agent  that  may  create  and  delete  its  agents. 


Superagent  Agents 


( .  control  flow;  -  data  flow ) 

Figure  1 .  A  conceptual  view  of  DICEtalk  peer-to-peer  pirocessing 

The  DICEtalk  object-oriented  framework  of  the  presented  model  is  implemented  by 
three  types  of  dispatchers:  atomic -dispatcher,  and-dispatcher,  and  or-dispatcher;  and 
three  types  of  managers:  task-manager,  rule-forward-manager,  and  rule-backward- 
manager.  The  problem-solving  engine  introduces  an  architecture  that  treats  a 
declarative  and  a  procedural  knowledge  base  as  one  active  objecL  Fundamental  to  this 
object-oriented  integration  is  the  notion  of  a  DICEtalk  knowledge  base  as  an  instance 
of  a  class  created  when  a  user  defines  a  new  knowledge-based  application.  This  new 
class,  say  SubKnowledgeBase,  is  a  subclass  of  the  predefined  KnowledgeBase 
class.  Facts  and  goals  of  a  declarative  knowledge  base  are  stored  in  instance  variables 
of  the  KnowledgeBase  class,  whereas  instance  variables  and  user  defined  methods  of 
the  SubKnov'ledgeBase  class  form  a  procedural  knowledge  base.  Thus,  this 
procedural  knowledge  base  inherits  a  declarative  knowledge  base  from  its  superclass 
KnowledgeBase.  Procedures  defined  as  methods  of  the  procedural  knowledge  base 
can  be  called  by  the  inference  engine  when  tasks  to  be  solved  contain  messages  sent  to 
kb,  the  current  DICEtalk  knowledge  base. 


606 


4  Knowledge>Based  System  Integration 

The  following  three  knowledge-based  applications  show  how  declarativi  ■  and  procedural 
knowledge  can  be  integrated  within  a  single  knowledge  base  and  within  an  integrated 
system  consisting  of  other  developed  programs,  built-in  tools,  and  knowledge  bases 
describing  how  to  perform  a  real,  complex  design  process.  This  three  include; 

].  The  Printed  Wiring  Board  Manufacturability  Adviser  (PWBMA)  (Padhy  & 
Dwivedi,  1990), 

2.  The  Turbine  Blade  Fabrication  Cost  Advisor  (Saidi,  1990),  and 

3 .  The  Process  Modeling  Knowledge-Based  System  (a  Process  Modeling  Project  at 
CERC,  Sobolewski). 

The  open  DICEtalk  integration  framework  is  illustrated  in  Figure  2.  This  framework 
is  based  on  three  fundamental  properties;  homogeneity  of  knowledge-based 
interactions,  information  exchange,  and  extensibility. 


-  PrtKedural  knowledge  layer  (knowledge  wrai^jer) 
Figure  2.  The  DICEtalk  open  system  framework 


4.1  Printed  Wiring  Board  Manufacturability  Advisor 

The  tasks  in  PWBMA  are  organized  into  a  hierarchical  structure.  The  main  goal  of  the 
system  is  to  evaluate  mfg  index.  The  following  rule  used  in  backward  reasoning 
shows  the  sUTicture  of  the  main  goal  in  terms  of  its  tasks: 

IF 

border  index  Ls  xl 

and  manufacturing  index  is  x2 


and  process  index  is  x3 
and  [kb  nrfglndex  for  y] 

THEN 

mfg  index  is  y 

astl:  parameter  is  I  border,  ast2:  parameter  is  Imanufacturing  ,  ast3:  parameter  is 
Iprocess 

where  Iborder.  Imanifacturing,  and  Iprocess  are  user-defined  parameters  (fcH’  antecedent 
subtasks  denoted  by  ast  with  indices)  for  border  index  is  xJ,  manufacturing  index  is 
x2,  and  process  index  is  x3,  respectively.  When  this  rule  is  used  in  backward 
direction,  and  the  tasks  astl,  astl,  and  ast3  are  solved,  then  their  dispatchers  assign 
values  of  variables  xl,x2,  andx?  to  the  parameters  Iborder,  Imanitfacturing,  and 
Iprocess,  respectively.  Now,  these  values  can  also  be  used  in  the  procedural  knowledge 
base  as  pool  variables  in  methods  of  the  current  knowlege-base  class.  Therefore,  when 
the  last  antecedent  task  [kb  rrfgindex  for  y]  is  executed,  the  message  mfgindex  is  sent 
to  kb,  the  current  knowledge  base  invoking  execution  of  the  corresponding  Smalltalk 
method.  Within  this  method,  the  parameters  Iborder,  Imanufacturing,  and  Iprocess 
defined  above  in  the  declarative  knowledge  base  can  be  accessed  as  Smalltalk  pool 
variables  instantiated  by  the  inference  engine: 
mfgindex 

^Iborder  value  +  Imanitfacturing  value  +  Iprocess  value)  /  3 

The  result  of  the  message  is  then  used  as  the  value  substituted  for  the  variable  y,  and 
the  inference  engine  returns  the  task  rrfg  index  is  y  appropriately  instantiated  as  a 
conclusion. 

Tasks  in  PWBMA  consist  of  other  compound  or  elementary  subtasks.  At  the 
bottom  of  the  task  hierarchy,  elementary  tasks  are  solved  by  rules,  such  as  the 
following  one  for  the  task  border  index: 

IF 

board  border  allowance  is  xl 
and  [xl  <  1.0] 

and  [kb  comment:  ’the  border  allowance  should  be  I  inch'] 

THEN 

border  index  is  D 

The  attribute  index  takes  values  A,  B,  C,  and  D  interpreted  in  the  procedural 
knowledge  base  as  numbers  4,  3,  2,  and  1,  respectively.  In  the  above  rule,  the 
antecedent  second  task  {[xl  <  1.0])  and  the  third  task  {[kb  comment:  'the  border 
allowance  should  be  I  inch'])  are  performed  as  procedural  tasks  performed  by  the 
Smalltalk  compiler  called  by  the  dispatchers  for  these  tasks. 

4.2  Turbine  Blade  Fabrication  Cost  Advisor 

•  Consider  the  following  rule  in  the  Turbine  Blade  Fabrication  Cost  Advisor: 

IF 

cluster  blade  number  is  xl 

and  pattern  assembly  time  is  x2 
and  cluster  dress  time  is  x3 
and  cluster  inspection  time  is  x4 


608 


and  [kb  kpp  for  y] 

THEN 

pattern  wax  cost  is  y 

astl :  parameter  is  Neb,  ast2  :  parameter  is  Tpa,  ast3:  parameter  is  Ted,  ast4:  parameter 
is  Tci 

where  Neb,  Tpa,  Ted,  Tci  are  user  defined  parameters  for  cluster  blade  number  is  xl, 
pattern  assembly  time  is  x2,  cluster  dress  time  is  x3,  and  cluster  inspection  time  is 
x4,  respectively.  Again,  when  this  rule  is  used  in  backward  direction  and  the  tasks 
astl,  astl,  astS,  and  ast4  are  proved,  then  the  infmnee  engine  assigns  the  values  of 
variables  xl,x2,  x3,  and  x4  to  parameters  Afch.  Tpa,  Ted,  and  Tci,  respectively.  Next, 
when  the  task  [kb  kpp  for  y]  is  executed,  then  kpp  is  sent  to  the  current  knowledge 
base.  The  result  of  that  message  is  used  as  the  value  substituted  fw  the  variable  y,  and 
then  the  inference  engine  returns  the  task  cost  wax  pattern  preparation  is  y, 
appropriately  instantiated  as  a  finding.  The  message  kpp  is  implemented  as  the 
following  Smalltalk  method: 
kpp 

I  kppCost  1 

kppCost  :=  1 1  Neb  value  *  (Rpl  value  +  Tpa  value  +  Ted  value  +  Tci  value) 

*  kclCost  +  kpCost. 
tbfCost  ;=  tbfCost  +  kppCost. 

NcppCost  asFloat 

where  tbfCost,  kclCost  and  kpCost  are  instance  variables  (for  other  costs)  in  the 
procedural  knowledge  base,  and  Neb,  Rpl,  Tpa,  Ted,  and  Tci  are  parameters  defined 
above  in  the  declarative  knowledge  base  for  the  rule  and  treated  as  Smalltalk  pool 
variables  in  the  procedural  knowledge  base  and  therefore  in  the  method  kpp. 


As  the  above  two  examples  illustrate,  the  presented  scheme  of  declarative  and 

procedural  knowledge  representation  is  based  on  the  following  mechanisms: 

1 .  Transfer  of  data  from  the  inference  engine  to  the  procedural  knowledge-base  via 
user  defined  parameters  associated  with  antecedent  tasks  in  rules. 

2.  Procedure  calls  by  the  inference  engine  to  execute  tasks  in  the  form 
[booleanExpression].  If  the  variable  kb  appears  in  booleanExpression,  it  means 
that  methods  of  the  current  procedural  knowledge  base  are  executed  (SPDL  Rule 
17). 

3.  Transfer  of  data  from  the  procedural  knowledge  base  to  the  inference  engine  by 
executing  tasks  in  the  form  [SmalltalkExpression  for  variableName]  (SPDL 
Rule  17). 


4.3  Process  Modeling  Knowledge-Based  System 

The  DICEtalk  distributed  knowledge-based  environment  also  is  supported  by  the  above 
mechanisms.  Consider  a  knowledge  base  machining  in  the  Process  Modeling 
Knowledge-Based  System  that  contains  the  following  SPDL  tasks,  as  well  as  a 
knowledge  base  about  these  tasks: 

1 .  Evaluating  design  modificatitms 


609 


2.  Evaluating  machining  costs  for  design  features 

3.  Evaluating  machining  processes 

Each  of  the  tasks  itself  has  an  £q)propriate  knowledge  base.  If  the  first  task  is  selected 
to  be  solved,  then  the  following  rule  is  to  be  fired: 

!F 

[kb  solveWith:  ‘Xusi^bs'design’] 

THEN 

design  modifications  is  evaluated 

and  the  task  [/l:Zis<7/ve)f'i7/i:‘\usrNkbs\design’]  is  executed.  The  method  solveWith:  is 
the  standard  DICEtalk  method  defined  in  the  KnowledgeBase  class  as  follows: 

solveWith:  aString 

Switch  to  a  global  knowledge  base  with  name  aString  and 
open  Problem  Solving  Browser  on  it.  Close  Problem  Solving 
Browser  opened  on  the  receiver's  global  knowledge  base. 

The  new  DICEtalk  agent  for  the  knowledge  base  design  (to  be  formed  in  a  disk 
directory  Nusi^bs)  is  created  with  the  following  tasks: 

1.  Specifying  surface  coordinates 

2.  Evaluating  profile  cotsrdinates  refinements 

3.  Specifying  superimpose  TRUCE  code 

4.  Specifying  cutler  path  TRUCE  code 

If  the  first  task  is  selected  to  be  solved,  then  the  following  rule  is  to  be  fired: 

IF 

[kb  inputFile:  ‘'usiNkbs'design’; 

cd:  ‘NusiVkbsNtooIs  call:  ‘tur’; 

turOutput:  ‘file2.dat’; 

accumulate:  #fmdings  as:  #assumptions] 

THEN 

surface  coordiruues  are  evaluated 

and  the  four  messages  are  sent  to  the  cunent  knowledge  base.  The  methods  cd.  caJl:  and 
accumulate :as:  are  the  standard  DICEtalk  methods  defined  in  the  KnowledgeBase 
class  as  follows: 

accumulate:  inSymbol  as:  outSymbol 

Add  all  elements  of  the  current  global  knowledge-base 
collection  with  the  type  inSymbol  to  the  elements  of  the 
next  global  knowledge-base  collection  with  the  type 
outSymbol . 
cd:  aPath  call:  aString 

Change  directory  to  path  name  aPath  and  call  a  routine 
aString  there. 

The  two  methods  inputFile:  and  turOutput:  form  a  kind  of  a  DICEtalk  wrapper  for  the 
FORTRAN  program  named  “tur”  and  are  defined  by  the  user  as  part  of  the  procedural 
knowledge  tose. 


i 


610 

5  Conclusions 

This  paper  has  presented  a  system  integration  based  on  a  knowledge-based  paradigm.  A 
knowledge  base  is  treated  as  one  active  object  consisting  of  both  declarative  and 
procedural  knowledge  bases.  During  problem  solving,  data  is  exchanged  bidirectionally 
between  declarative  and  procedural  knowledge.  The  feasibility  of  using  a  knowledge- 
based  paradigm  in  developing  systems  and  system  integration  for  a  CE  environment 
has  been  demonstrated.  In  the  DICE  (DARPA  Initiative  in  Concurrent  Engineering) 
program,  more  than  twenty  knowledge-based  engineering  tasks  were  experimentally 
implemented  with  the  system:  Duct  connection  (1),  Flange  connection  (2),  Material 
fatigue  (3),  Disk  forging  (4),  Robot  assembly  advisor  (5),  Turbine  blade  fabrication 
cost  advisor  (6),  Printed  board  manufacturability  advisor  (7),  Printed  board 
assemblability  advisor  (8),  Assembly  planning  simulator  (APS)  (9),  Design  process 
model  (PROCESS)  (10),  Machining  processes  (11),  Preliminary  Design  Evaluation 
Module  (PDEM)  (12),  Cost  estimation  for  machining  processes  (13),  Manufacturing 
process  planning  (14),  Machine  operation  advisor  (15),  Machine  selection  advisor 
(16),  Tool  material  type  selection  advisor  (17),  Tool  material  selection  advisor  (18), 
Cutting  fluid  selection  advisor  (19),  Knowledge-Based  PARametric  Finite  Element 
Modeler  for  aircraft  engine  blades  (I^-PARFEM)  (20),  and  Knowledge-based  MODal 
Finite  Element  Modeler  (MODFEM)  (21). 

Task  10  integrates  all  the  tasks  into  a  design  process  model  allowing  designers  to 
navigate  between  relevant  knowledge  bases  and  engineering  tools  during  problem 
solving.  Task  10  includes  Task  11  as  a  subtask,  which  itself  includes  Tasks  12  and  14 
as  its  subtasks.  Task  12  integrates  four  FORTRAN  programs,  and  Task  13  integral  js 
two  FORTRAN  programs  within  the  DICEialk  knowledge  bases.  Tasks  15,  16,  17, 
18,  and  19  are  implemented  as  knowledge  bases  integrated  as  subtasks  within  the 
knowledge  base  of  Task  14.  Task  20  implements  a  knowledge  base  about  finite 
element  mesh  calculation  (integrating  here  a  bunch  of  external  C  programs  and 
Smalltalk  methods)  and  then  calls  the  tool  I -DBAS  to  do  the  final  finite  element 
analysis.  Task  21  implements  a  knowledge  base  for  geometric  parameter  selection, 
substructuring  selection,  and  finite  element  modeling/analysis  (integrating  here  a 
collection  of  external  C  programs  and  Smalltalk  methods)  and  then  calls  I-DEAS  to  do 
the  final  substracturing  analysis. 

The  successful  implementation  of  the  experimental  applications  with  DlCEtalk 
suggests  the  practical  usability  of  the  system.  Applications  were  easy  and  fast  to 
implement,  and  the  user  interface  and  overall  system  functionality  were  well  received 
by  the  users.  All  the  applications  mentioned  above  have  been  integrated  into  a  design 
process  model  under  one  unifying  umbrella  to  provide  an  intuitive,  uniform  model  of 
interactions. 

References 

Benner,  J.C.  (1991).  KB-PARFEM:  A  Knowledge-based  Parametric  Finite  Element 
Modeler  for  Aircraft  Engine  Blades.  Master’s  Thesis,  West  Virginia  University, 
Morgantown. 

Chung,  S.  (1992)  Knowledge-Based  Mechanical  Design  Using  Substructuring  in  a 
Concurrent  Engineering  Environment.  Doctoral  Dissertation,  West  Virginia 
University,  Morgantown. 


611 


Du,  B.,  Rachakonda,  S.,  Dwivedi,  S.,  Karinthi,  R.,  Sobolewski,  M.,  and  Dax,  R. 
(1992).  Forging  Process  Design  in  a  Concurrent  Engineering  Environment.  J.P. 
Hager  (ed.)  EPD  Congress  1992,  A  Publication  of  The  Minerals,  Metals  & 
Materials  Society,  pp.  531-545. 

Goldberg,  A.,  and  Robson,  D.  (1989).  Smalltalk-80:  The  Language,  Reading,  MA: 
Addison-Wesley. 

Kulpa,  Z.,  and  Sobolewski,  M.  1992.  Knowledge-directed  Graphical  and  Natural 
L^guage  Interface  with  a  Knowledge-based  Concurrent  Engineering  Environment. 
Proc.  of  the  8th  Int.  Corference  on  CAD/CAM,  Robotics  and  Factories  of  the 
Future‘92,  Aug  1992,  Metz,  France,  pp.238-248. 

Padhy,  S.K.,  and  Dwivedi,  S.N.  (1990).  A  Knowledge  Based  Approach  for 
Manufacturability  of  Printed  Wiring  Boards.  Proc.  of  the  Fifth  Int.  Conference  on 
CADICAM,  Robotics  and  Factories  of  the  Future’ 90. 

Padhy,  S.K.  (1990).  A  Knowledge-Based  System  for  PWB  Manufacturability  in 
Concurrent  Engineering  Environment.  Master’s  Thesis,  West  Virginia  University, 
Morgantown. 

Saidi,  M.  (1991).  Turbine  Blade  Investment  Casting  Cost  Advisor  Model.  Proc.  of 
AACE’s  1991  Annual  Meeting,  Seattle,  WA. 

Sobolewski,  M.  (1987).  Percept  Knowledge-base  Systems.  I.  Plander  (Ed.),  Artificial 
Intelligence  and  Information  -  Control  Systems  of  Robots.  North-Holland. 

Sobolewski,  M.  (1989a).  EXPERTALK:  An  Object-Oriented  Knowledge-based 
System.  I.  Plander  (Ed.),  Artificial  Intelligence  and  Information-Control  Systems  of 
Robots.  North-Holland. 

Sobolewski,  M.  (1989b).  Percept  Knowledge  Description  and  Representation,  ICS 
PAS  Reports  No.  663.  Institute  of  Computer  Science  of  Polish  Academy  of 
Sciences. 

Sobolewski,  M.  (1990a).  Percept  Knowledge  and  Concurrency,  Proc.  of  the  Second 
National  Symposium  on  Concurrent  Engineering,  West  Virginia  University, 
Morgantown.,  pp.  111-137. 

Sobolewski,  M.  (1990b).  DICEtalk;  An  Object-Oriented  Knowledge-Based 
Engineering  Environment.  In:  CADICAM,  Robotics  and  Factories  of  the  Future 
‘90,  Vol.  1:  Concurrent  Engineering,  Berlin;  Springer- Verlag,  pp.  1 17-122. 

Sobolewski,  M.  (1991a).  Object-Oriented  Knowledge  Bases  in  Engineering 
Applications.  CADICAM,  Robotics  and  Factories  of  the  Future  '91  Southbank 
Press,  Vol.  1,  pp.  470-475. 

Sobolewski,  M.  (1991b).  Percept  Conceptualizations  and  Their  Knowledge 
Representation  Schemes.  Z.W.  Ras  and  M.  Zemankova  (Eds.)  Methodologies  for 
Intelligent  Systems,  Lecture  Notes  in  A1 542,  Berlin:  Springe-Verlag,  pp.  236-245. 

Sobolewski,  M.  (1991c).  Integration  of  Declarative  and  Procedural  Knowledge  in 
Engineering  Applications.  Expert  Systems  World  Congress  Proceedings, 
Pergamonn  Press:  New  York,  Vol.  3,  pp.  1816-1823. 


A  Reflective  Strategic  Problem  Solving  Model 


Patricia  Charlton 

School  of  Mathematical  Sciences 
Bath  University 
Claverton  Down 
Bath  BA2  7AY 
England 

pcCnaths . bath .ac.uk 


Abstract.  We  define  a  strategic  model  which  uses  reflection  as  part  of 
its  control  framework.  A  strategy  is  a  process  of  building  up  units  of 
knowledge  sources.  We  use  Newell’s  [5]  control  model  to  give  structure. 
The  control  structures  are  classed  as  unit  and  part-of-unit  processes.  The 
unit  is  the  production  staterrent  and  the  process  is  how  the  rule  will  be 
applied.  The  part-of  unit  is  the  reflective  control  for  the  clustering  and 
building  of  knowledge  sources.  This  method  assists  in  providing  clarifica¬ 
tion  during  the  process  of  abstraction  for  the  problem.  Our  blackboard 
interpreter,  which  is  used  to  provide  part  of  the  control,  is  extended  in  a 
similar  way  to  OPM  [8],  using  heterarchical  abstractions.  As  the  strate¬ 
gic  model  is  developed  to  provide  a  reasoning  reflective  system,  we  show 
how  the  interpreter  is  extended  to  provide  an  adaptive  learning  system. 


1  Introduction 

In  this  paper  we  give  a  brief  outline  of  our  blackboard  interpreter,  and  show  how 
to  extend  the  structure  to  give  a  reflective  model  for  building  problem-solving 
strategies.  The  application  implemented  using  our  model  is  a  forest  fire  fighting 
[2]  simulation.  We  show  the  changes  in  the  blackboard  model  to  explain  how  to 
deal  with  abstraction;  that  is,  problem  definition  and  solution  formation.  We  use 
fine-grain  knowledge  source  structures  to  assist  in  problem  representation  and 
abstraction  clarity.  The  reflective  process  provides  control  [10]  in  a  complex  and 
changing  environment.  The  problem  domain  represented  for  the  reflective  pro¬ 
cess,  is  that  of  strategies  for  problem  solving.  One  approach  to  problem-solving 
is  planning  and  the  planning  process  requires  abstraction  which  is  used  to  fill 
in  the  vague  outline  of  the  strategy.  In  order  to  reason  about  the  problem,  the 
system  is  required  to  provide  the  properties  of  strategies,  planning  and  definition 
in  a  recursive  form. The  problem  solving  model  described  here  provides  a  struc¬ 
ture  for  defining  such  problems.  First,  however,  we  will  describe  the  blackboard 
interpreter. 

2  The  Blackboard  Interpreter 

Often  in  AI,  there  is  difficulty  in  clearly  identifying  how  the  problem  should  be 
defined  in  a  given  system  and  the  problem  definition  starts  to  depend  on  the 


613 


system  being  used.  Blackboard  interpreters  have  been  found  to  be  difficult  to  use 
to  build  strategic  planners  for  general  cases  [3j.  We  support  and  amplify  these 
findings:  when  using  such  systems  we  have  found  the  control  to  be  restrictive. 
Strategies  require  no  restriction  in  the  problem  definition,  but  control  has  to  be 
developed  over  time. 

The  blackboard  system  provides  only  a  general  restrictive  and  uncontrolled 
environment.  This  is  close  to  the  strategy  context  but  we  need  to  refine  the 
blackboard  structure  offered  so  that  complex  problems  can  be  handled  with  an 
appoach  which  hjis  strategic  control.  The  revised  blackboard  model  gives  the 
necessary  control  for  an  adaptive  learning  system  using  a  strategic  method. 

The  blackboard  model  requires  a  problem  abstraction  for  the  knowledge 
sources.  The  first  level  of  abstraction  is  to  define  areas  which  classify  the  sys¬ 
tem.  The  application  used  for  this  discussion  is  forest  fire  fighting  [2].  A  simple 
abstraction  of  the  knowledge  sources  for  the  problem  are  bulldozer,  fire  and  en¬ 
vironment.  There  are  further  abstractions  which  can  be  considered  such  as  the 
control  centre.  It  is  possible  to  abstract  many  knowledge  source  levels  for  this 
application.  This  helps  to  illustrate  the  fine  grain  knowledge  source  (KS)  defi¬ 
nition  problem  identified  by  Tate  [15],  and  the  abstraction  hierarchy  difficulties 
outlined  by  Craig  [3].  Although  the  system  can  be  defined  within  the  blackboard 
interpreter,  problems  are  highlighted  when  trying  to  represent  a  dynamic  appli¬ 
cation.  Below,  we  show  the  template  for  a  simple  bulldozer  knowledge  source 
requiring  a  move  rule. 

Specifying  a  KS  for  the  Bulldozer  Domain 

Knowledge-Source-Raae  KS-BuHdozcr 

Trigger-Condition  $EVEHT-L=1 

or 

$EMTRY»fire 

Precondition  t 

Condition-Action  pair:  Rule-id  Hove 

Condition  $EHTRY=fire 

Action  Hove-Bulldozer 

(Local -Variables  ()) 

Each  rule  is  given  an  identifier  which  is  matched  to  a  user-defined  event  and 
events  can  have  states  which  can  be  changed  via  the  activated  knowledge  sources 
as  a  result  of  posting  an  entry  by  means  of  the  agenda.  Once  on  the  blackboard, 
the  entries  can  be  modified  by  the  knowledge  sources,  providing  the  agenda 
allows  the  change. 

The  agenda  allows  the  control  to  determine  which  knowledge  source  will 
contribute  to  the  problem.  The  knowledge  source  is  executed  as  a  knowledge 
activation  record  and  may  contribute  to  forming  the  problem  solution.  The  KSs 
are  static  structures  while  the  knowledge  source  activation  records  (KSARs) 
are  dynamic  and  can  be  modified.  It  is  the  entries  which  are  posted  onto  the 
blackboard  which  allows  KSs  to  be  triggered.  Once  the  triggered  KSs,  which  are 
represented  by  KSAR’s,  become  eligible  they  can  be  executed.  The  KSARs  may 
add  to  the  blackboard  a  new  event,  or  a  modified  event  attribute,  or  add  an  event 
attribute.  The  action  part  of  the  knowledge  source  may  be  built  from  a  structure 


614 


called  propose,  (actions  do  not  have  to  be  a  propose  structure).  The  propose 
structure  causes  the  events  to  he  modified  and  created.  These  event  activities 
may  cause  changes  to  the  blackboard  entries.  Below  we  show  an  example  of  a 
propose  structure. 

(PROPOSE  :change-type  ’add 

••entry  ’create-fire 
: level  (SVALUE  iire-level) 

: attribute  ’f ire-te»perature 

:value  (coapute-teaperature  ($GET  ’create-fire ’fire-position) ) ) 

The  purpose  of  this  action  is  to  calculate  the  temperature  of  the  area  which  is 
now  on  fire.  The  fire-level  is  set  by  the  trigger  function  and  compute-temperature 
is  an  external  application  function. 

Events  are  changes  to  the  blackboard  state  and  are  caused  via  an  entry  being 
posted.  Enteries  exist  on  one  level  of  abstraction  on  the  blackboard,  at  a  given 
time.  They  are  able  to  change  levels  via  an  event.  Network  of  entries  are  created 
as  a  solution  forms  on  the  blackboard.  The  networks  are  formed  between  entries 
through  explicit  representation  of  relationships  and  the  links  between  entries  are 
through  the  attribute-value  structure. 

The  communication  and  the  agenda  are  essentially  the  control.  The  control 
structure  says  how  and  when  a  knowledge  source  may  add  to  the  blackboard. 
Such  an  agenda  structure  is  mainly  problem-dependent  when  using  the  black¬ 
board  interpreter  [5].  This  occurs  as  the  blackboard  model  is  described  as  an  ab¬ 
stract  structure  [5]  and  as  such  becomes  problem-dependent  as  the  organistaion 
framework  is  implemented.  This  can  cause  the  model  to  be  less  adaptive  and 
domain  dependent  as  discussed  by  Craig  [3].  There  are  other  problems  encoun¬ 
tered  by  the  control  mechanism  for  updating  the  blackboard  [11].  We  propose  a 
networking  structure  for  developing  a  number  of  levels.  These  are  mainly  devel¬ 
oping  the  agenda,  blackboard  updating  and  the  reflective  process.  The  strategic 
model  overcomes  some  of  the  restrictions  in  the  blackboard  model.  To  achieve 
an  adaptive  model  we  need  to  add  another  level  of  control. 

3  Defining  the  new  model 

Planning  is  a  method  of  controlling  and  monitoring  changes  in  the  state  of  a 
problem  environment.  There  are  states  which  are  either  static  or  dynamic.  These 
states  are  used  to  represent  partial  problem  abstractions  of  the  system  (the 
system  being  the  problem  definition).  In  the  blackboard  every  KSAR  executed 
is  a  state  change.  The  user-defined  events  for  the  blackboard  allow  an  event  to 
change,  the  actions  being  modify,  add  or  new.  The  strategic  networking  structure 
is  defined  to  allow  the  user  to  define  powerful  control  systems  easily.  Control 
structures  work  on  critical  state-changes  as  a  priority  mechanism.  To  achieve 
control,  levels  of  state  inference  are  defined  in  terms  of  unit  and  part-of  unit 
processes.  This  becomes  important  to  a  reflective  or  general  problem  solving 
system  as  perception  and  problem  description  of  the  objective  can  be  presented 
in  context. 


615 


3.1  The  problem  context 

To  give  emphasis  to  a  problem  context  requires  an  appreciation  of  the  problem  in 
terms  of  itself  i.e.  the  sub-system  needs  to  be  detached  while  still  being  part  of  the 
problem.  This  again  falls  well  in  to  an  orthogonal  system  which  is  reflective  and 
infinite.  Problem  organisation  hierarchy  requires  survival  towards  the  problem, 
the  context,  and  the  approach. 

The  reflective  process  has  no  strict  hierarchical  control  but  allows 
sub-components  in  terms  of  understanding  the  interaction  that  component  has 
i.e,  the  components  provide  the  part-of  unit  process  for  the  sub-componciiis. 
The  next  level  of  reasoning  [9]  informs  the  system  how  that  control  should  be 
handled  without  breaking  the  code  of  the  strategy  i.e.  it  should  be  flexible  to 
avoid  destruction  which  is  a  control  to  achieve  a  result. 

For  reflective  process  we  define  the  meta-state  which  gives  meta-state  level 
control.  The  meta-objects,  which  define  the  reasoning  rules  [6],  are  needed  to 
decide  if  we  can  solve  the  problem  at  this  level.  However  this  can  be  seen  as 
one  process,  as  Smith  [12]  discusses  the  use  of  identifying  the  components  and 
realising  the  structure.  The  decision  process  looks  after  the  state  process  which 
in  turn  looks  after  the  decision  process. 

The  agenda  is  defined  as  the  meta-ks  and  looks  after  the  other  KS  structures 
which  have  become  part-of  the  agenda  structure  (Meta-Meta-Ks  will  look  after 
the  Meta-ks  etc).  The  definition  will  depend  on  what  level  the  system  is  being 
viewed  from.  All  planning  structures  are  defined  within  themselves  [16]  so  that 
sub-plans  are  themselves  planning.  System  adaption  is  allowed  by  copying  KS 
to  new  structure  to  modify  this  structure.  The  next  time  the  agenda  uses  the 
structure  the  level  of  interpretation  has  changed.  It  is  a  meta-state  change  which 
will  cause  a  new  KS  to  be  expressed. 

3.2  Extensions  Required  to  provide  a  flexible  Strategic  Model 

Figure  1  shows  the  explicit  panels  of  the  strategic  model  which  is  defined  in  terms 
of  the  blackboard  structure  but  is  extended  explicitly  to  deal  with  reflection  in 
terms  of  the  unit  process,  part-of  process  and  the  strategic  system. 

The  Agenda-ks  is  the  intermediate  process  which  provides  the  part-of  unit 
for  the  Ks-fine-grain  process  unit.  The  Agenda-ks  is  expressed  by  the  knowledge 
source  template  and  provides  for  further  structures  the  same  template.  The 
Meta-agenda  is  formed  from  the  Agenda-ks  and  can  be  a  unit  which  has  both 
Ks-fine-grain  and  Agenda-ks.  The  part-of  unit  exists  as  the  Meia-agenda  once  it 
has  spawned  the  Meta-meta-agenda  process,  which  follows  the  same  procedure 
as  the  Agenda-ks.  The  system  as  described  can  view  problem  definitions  and 
solutions  in  the  form  of  the  blackboard,  the  agenda  and  the  knowledge  source. 
This  allows  the  different  strategic  processes  to  be  viewed  as  different  reasoning 
context  definitions.  The  most  reflective  process  is  the  part-of  which  is  used  to 
provide  the  adaptive  learning  context  for  the  strategic  model. 


616 


Figure  1 


The  ks-type-1  can  have  a  number  of  problem  dependent  kS  attached.  To 
provide  solution  islands  the  system  may  generate  ks-type-2  etc.  The  ks-type  can 
communicate  via  the  agendas  and  the  blackboards.  Below  shows  the  structure 
of  a  simple  bulldozer  definition  through  the  strategic  model. 

If  an  operation  move  is  required  specific  to  the  bulldozer  then  the  information 
to  move  has  to  be  aquired.  The  information  can  be  established  either  from  an 
event  or  a  KS.  If  an  event  or  KS  does  not  exist  for  an  operation  then  the  structure 
needs  to  be  built.  This  system  adaption  is  controlled  from  the  agenda  level.  Once 
the  possible  KS  solution  has  been  generated  control  is  passed  back  to  the  ks-type. 
Part  of  the  reflective  control  occurs  through  the  id  ks-type.  As  all  the  conditions 
have  an  id  they  all  have  the  control  option. 


ks-type- 1 (bulldozer) 

: triggers  ((($EVEMT-IS  ’setup)  level (operat ion) ) 

(or  $EVEIIT-IS  ’active)) 

: precondition  (type  level  active) 

:action  (rules :nev-rule  :id  ks-type (bulldozer  ret-op) 

: conditions  ($priority-set  return-operations) 
: action 

(PROPOSE  : change-type  ($ret-val  ret-op) 

: entry  ($ret-entry  ret-op) 

: level  ($case-rule  ret-op) 

: attributes  ($res-op  ret-op) 
lvalue  ($res-val  ret-op)))) 

: local  variable-returns)) 


The  figure  2  shows  how  the  links  are  provide  and  the  indexing  is  achieved  for 
the  strategic  model. The  structure  can  encompass  each  component  allowing  the 
meta-control  look  at  all  of  the  blackboard.  Internally,  this  view  can  be  the  whole 
structure,  that  is  the  meta-control  which  is  also  operates  as  part-of  process. 


617 


Control-(n)  ^  Ks-type(a) _ 

I  «<*> 

I  ^  ^ 

Control-(nXn) _ ^  AgeDda-ks-type(oKo) 

I 

I  ^ 

'  >» 
a*” 

Meta-cootrol  ^  _ ^  Meui-ageada-lts-iype(oKD)<o) 


Ks-lype-Int-KSI  ~1 

Blaclcboard-tvpe-(Dll 


I  Meta-Blackboard~l 


Control-(n) 


Cootrol-(oXn) 


Me(a-control-(nXaX>>) 


Ks-tyi*-(n) 


Control 


AgeDda-lu-^pe-(QXD) 


BUckboard 


Meu>ageDda'k5<type(DKDXD) 
Figure  2 


Like  0PM  we  can  view  parts  of  the  blackboard,  which  is  itself  a  blackboard, 
P.nd  we  can  request  restructuring  from  a  high  level.  The  network  perspective  is 
important  at  this  point  of  analysis  and  assists  in  answering  some  of  the  classifi¬ 
cation  problems  in  abstraction  analysis.  The  overall  model  allows:  reasoning 
links,  partial  blackboard  analysis,  updating  and  the  recognition  of 
changes, the  blackboard  to  remain  distributive, the  control  to  remain 
distributive,  levels  of  interpretation  from  the  model  to  the 
execution  (KSI),  and  static  and  dynamic  updating  from  execution 
which  allows  an  adaptive  learning  system. 

The  network  control  structure  builds  the  KS  from  the  condition  set.  This 
organisation  of  the  conditions  depends  on  how  the  KS  is  to  interact  with  the 
overall  problem.  The  process  used  to  is  establish  possible  linked  relationships. 
Strategies  are  dependent  in  their  most  abstract  definition  for  their  approximation 
of  a  problem.  The  reasoning  has  to  be  supported  for  explanation  which  can  be 
internal  to  the  unit  process,  part-of  the  unit,  or  as  whole  system.  The  infinite 
model  is  represented  by  a  reflective  reeisoning  process.  Knowledge  sources  are 
formed  from  the  basic  production  system  to  the  most  complicated  meta-agenda. 


3.3  Defining  the  strategy 

In  most  systems  a  protocol  is  defined  for  how  a  problem  can  be  expressed.  Dif¬ 
ficulties  arise  with  this  when  a  problem  is  not  well  understood  or  when  more 
information  is  needed.  Each  situation  is  a  symptom  of  the  other.  Tutoring  sys¬ 
tems  provide  environments  of  increasing  [1]  information  when  a  student  is  finding 
difficulty  with  a  problem.  In  a  problem  solving  environment  the  information  may 
be  available  to  solve  the  problem  but  the  understanding  of  the  information  is  not 


618 


present  in  a  form  which  can  be  ecisily  reasoned  about.  The  process  of  solving  a 
problem  can  be  presented  with  a  solution  process.  It  is  this  solution  process  that 
describes  a  structure  from  each  process,  which  may  also  offer  further  problem 
representation  in  context. 

This  idea,  in  itself,  is  not  new:  a  similar  process  is  used  in  Case-based  reason¬ 
ing  [9,  7]  (CBR)  systems.  Learning  by  analogy  is  a  method  to  improve  and  solve 
similar  problem  situations.  The  analogy  method  of  problem  solving  uses  already 
solved  cases  as  a  domain  comparison.  The  ctise  is  modified  and  or  extended  to 
solve  the  new  problem.  There  is  a  bootstrap  problem  with  this  form  of  problem 
solving  when  there  is  no  case  to  do  a  comparison  on.  Although  in  reality  there 
is  always  some  level  of  case  rule  to  compare  to.  A  context  of  abstraction-level  is 
required  to  employ  the  right  level  of  case  rule.  The  context  requires  not  only  the 
rules  but  also  the  appreciation  of  how  and  when  to  use  the  rule.  This  form  of  con¬ 
text  analysis  can  take  many  forms,  for  example,  to  establish  a  simple  condition 
as  true  or  false  or  to  initialise  the  strategic  outline  from  the  system  rules. 

In  most  planning  system  results  are  presented  in  one  form  only.  However, 
to  appreciate  the  process  of  problem  solving  in  an  abstract  fashion,  this  process 
cannot  be  fixed,  an  open  structure  is  required.  This  open  structure  requires  a 
condition  analysis  process.  The  first  step  we  take  in  bringing  this  system  into 
focus  to  be  implemented  is  the  use  of  the  condition  set. 

3.4  Defining  the  Condition  Set 

Conditions  are  built  by  the  system  state  which  will  have  degrees  of  dynamic 
interaction.  This  is  determined  by  the  priority  setting  which  decides  critical 
factors  from  the  user  event  definition.  In  defining  the  condition  the  context  of 
the  abstraction  definition  becomes  such  that  the  reflection  looks  at  how  the 
condition  reacts  to  the  system  state. 

Condition-n,  as  a  unit  is  the  statement  that  represents  n  for  example  fire  or 
refuel.  To  arrive  at  a  condition  there  must  be  a  before-action  (and  before  the 
before-action  a  condition  etc):  this  describes  a  causal  relationship.  The  usual 
condition-action  pair  definition  results  but  the  claissification  of  an  action  is  a 
condition.  This  allows  the  initial  definition  of  the  world  we  are  interested  in 
to  be  statements  (similar  to  horn  clauses  in  Prolog  [13]).  To  start  our  initial 
reflective  condition  we  look  for  the  most  basic  element,  which  in  this  problem 
of  definition,  is  the  condition  or  the  action.  This  method  is  used  to  provide  the 
constructs  for  building  the  more  complex  structure.  The  building  of  the  structure 
is  the  strategy.  By  using  such  a  method  we  are  able  to  reason  and  present  context 
in  the  unit  form  and  as  a  part-of  form.  This  technique  of  using  reflection  is  not 
unlike  the  method  used  in  ELEKTRA  [4]  when  interfacing  to  the  blackboard 
structure. 

The  ELEKTRA  system  is  constructed  in  such  a  way  it  supports  reflective 
processing,  that  is,  ELEKTRA  programs  (sets  of  production  rules)  can  reason 
about  themselves  in  a  theory-relative,  causally-connected  fashion.  In  terms  of 
production  systems,  this  means  that  production  rules  are  allowed  to  match  other 
rules  and  to  inspect  and  reason  about  their  contents.  The  execution  of  some 


619 


rules,  In  particular  those  rules  which  assigned  the  status  of  meta-rules,  causes 
changes  to  the  system  which  are  directly  felt  at  the  object  level.  This  goes  beyond 
the  normal  property  of  production  systems  by  which  the  execution  (or  ‘firing’) 
of  one  rule  alters  working  memory  contents,  thus  enabling  some  other  rules, 
while  disabling  others,  even  though  this  is  one  of  the  main  ways  in  which  the 
causal  connection  between  levels  is  achieved.  In  our  strategic  model  we  carry  this 
process  through  to  the  condition  level.  This  allows  conditions  to  be  complete 
knowledge  source  structures.  Although  the  condition  from  one  level  may  be 
only  an  expression  which  returns  a  value  or  a  structure  which  requires  further 
evaluation. 

As  the  strategic  reflective  process  is  already  built  before  passing  the  con¬ 
dition  to  the  blackboard,  then  many  of  the  abstraction  problems  are  resolved. 
The  condition  definitions  can  become  part  of  the  blackboard  to  allow  further 
consistent  control  to  follow  through  from  one  system  to  another. 


3,5  Adding  control  through  Reflection 

Using  the  agenda  and  the  knowledge  source  structure  we  develop  from  0-Plan2, 
NONLIN  [14]  and  0PM  ideas  incorporating  reflection  and  abstraction.  Planning 
systems  demand  a  form  of  problem  abstraction. 

We  use  layered  agendas;  these  define  the  KS  network  and  are  themselves 
structured  as  KS.  The  agendas  are  more  abstract,  but  are  still  knowledge  sources 
which  are  able  to  reason.  One  of  the  problems  outlined  by  Tate  [15]  is  the 
choice  between  modular  fine  grain  KS  and  efficient  large  KS.  The  agenda  in  KS 
representation  allows  KS  clusters  for  a  defined  point  (sub-task).  The  agendas 
(Higher-Level  KSs)  are  controlled  by  a  further  level,  the  scheduler.  The  structure 
is  easy  to  visualise  in  a  static  mode  or  a  snap  shot  of  the  “state” .  However,  in 
a  reflective  system  the  control  of  each  component  is  independent  and  at  the 
same  time  has  complete  control  over  the  system.  A  heterarchical  structure  in 
0PM  is  used  [8]  for  a  further  abstraction  plane(s).  The  blackboard  model  is 
restrictive  in  allowing  KSs  to  be  aware  of  other  knowledge  sources  other  than 
via  the  database  with  the  control  through  the  agenda.  In  a  reflective  system  all 
structures  have  control  and  are  themselves  controlled.  The  hierarchy  of  a  problem 
is  not  always  a  true  structure  of  the  problem  but  it  is  a  way  of  representing  a 
problem  solution.  In  a  strategic  system  this  can  be  analysed  in  a  number  of  ways: 
as  a  new  problem  del init ion, solution, addition  to  the  system,  control, 
success  or  failure. 

The  user  event  meta-system  creates  events  when  required  either  internally  or 
by  external  request:  this  is  part  of  the  reflective  process.  This  allows  a  (sub)system 
to  recognise  a  need  and  create  a  “system  solution”.  This  will  use  the  building 
blpcks  to  present  what  it  can  contribute  to  the  (sub)system  drawing  all  other 
possible  KSs  into  action. 

As  explained  earlier  the  agenda  can  be  posted  as  a  whole  structure.  A  meta 
structure  is  used  to  view  the  system  for  a  global  self-analysis  although  this  will 
be  organised  into  groups.  The  control  method  uses  a  NONLIN  [14]  characteristic, 
that  is  to  allow  groups  (and  sub  groups  etc)  to  interrogate  structures  and  create 


620 


further  analysis  structures  which  contains  a  context  value.  The  context  value  is 
relative  to  the  level  of  abstraction.  This  method  is  used  to  develop  more  eiHcient 
solutions  where  efficiency  is  calculated  by  the  number  of  steps  required  to  achieve 
a  goal. 

Below  gives  the  initial  outline  structure  for  the  agenda  template.  The  Sname 
functions  are  built  from  the  condition  class.  This  class  inherits  structure  from 
the  trigger  language  provided  in  the  blackboard  interpreter.  Once  lh<-  initial 
structure  is  provided  by  the  system  the  strategic  network  can  be  used  to  put  the 
condition-set  in  place.  The  whole  process  may  take  many  passes  to  provide  the 
solution  framework. 

(declare-user-events  ’agenda  :new) 

(aetq  ks-agenda 
(def-ks 

mane  ’ks-agenda 

:t]rigger  ’(or  ($agenda-setup-is  ’set-up) 

($inprove-agenda-is  ’aake-nea) ) 

:precondition  ’ ($relationship-set  ’agenda) 

:action  (list  (rules; neo-rule  :id  ’agenda 

; conditions  t 
: action 

(PROPOSE  : change-type  ’make-sub-ks 
: entry  ($create-agenda) 

: level  ($create-level) 

: attribute  ($event-contribution) 

: value  ($set-macro-type) ) ) ) 

: local-variable-nanes  ’()) 

The  agenda  is  structured  as  a  knowledge  source  which  means  that  the  value 
context  can  be  represent  d  as  a  knowledge  source.  This  context  representation 
can  take  a  range:  from  part  of  to  a  complete  KS.  This  is  all  part  of  the  abstraction 
and  refle'’tion  processes.  Each  level  is  indexed  and  can  be  searched  and  identified. 
The  condition-action  control  shows  the  developing  abstraction  structure.  This 
differs  from  other  systems  as  the  reasoning  mechanism  allows  the  structure  to 
be  compartmentalised  in  a  form  that  can  be  evaluated  as  either  a  unit  process  or 
part-of  unit  process.  We  are  required  to  make  structures  explicit  for  the  problem 
definition  representation  which  requires  knowing  how  to  represent  structures, 
when  and  understanding  why. 

4  Conclusion 

Our  organisational  method  is  a  reflective  model  which  allows  plan  repair,  pro¬ 
vides  decision  explanation,  and  new  rules  to  he  added.  To  establish  ‘hese  at¬ 
tributes  along  with  problem  context  and  definition  the  model  needs  to  view 
itself  and  the  problem  objevtively. 

Abstractions  even  in  simple  problems  can  be  difficult  to  define  and  may  not 
provide  a  hierarchy:  this  structure  allows  the  problem  definition  to  move  between 
forms. 


621 


To  provide  blackboard  unification  the  agendas  are  made  up  of  KS  which  may 
be  subagendas.  Like  other  planners,  it  reduces  complexity  by  providing  another 
level  of  control.  As  part  of  a  control  method  the  system  can  force  a  value  context, 
this  can  only  occur  once  parts  of  the  problem  have  been  well  defined  through 
the  system  structure.  For  parts  of  the  problem  to  become  well  defined  and  for 
solutions  to  form  may  take  many  passes  through  the  system.  Our  model  pro¬ 
vides  mechanisms  to:  allow  statement  definition  through  the  condition  set,  use 
the  natural  expert  system  strategy  for  solution  expression,  overcome  the  black¬ 
board  interpreter  problems,  improve  abstraction  definition  in  problems,  allows 
strategic  analysis,  and  provides  reflection  for  an  adaptive  learning  environment 
with  control. 

Our  model  and  blackboard  system  are  both  being  developed  in  Common 
Lisp.  We  would  like  to  thank  Iain  Craig  who  has  allowed  us  to  use  his  blackboard 
interpreter  for  the  basis  of  our  research. 

References 

1.  D.  Bierman,  J  and  P.  Kamsteeg,  A.  Elicitation  of  knowledge  with  and  for  in¬ 
telligent  tutoring  systems.  International  Journal  of  Educational  Research,  pages 
799-807,  1988. 

2.  R.  Cohen,  D.  Greenberg,  D.  Hart,  and  A.  Howe.  Trial  by  fire.  A1  Magazine, 
10(3);33-48,  faU  1989. 

3.  I.  D.  Craig.  The  blackboard  architecture:  A  definition  and  its  implications.  Re¬ 
search  report.  University  of  Warwick,  1987. 

4.  I.  D.  Craig.  ELECTRA;  A  Reflective  Production  System.  Research  report.  Uni¬ 
versity  of  Warwick,  1991. 

5.  R.  Englemore  and  T.  Morgan.  Blackboard  Systems.  Addison  and  Wesley,  1988. 

6.  Giunchiglia.  A  system  for  Multi-level  Mathematical  Reasoning.  Artificial  Intelli¬ 
gence  in  Mathematics,  1991. 

7.  K,  J.  Hammond.  CHEF;  A  Model  of  Case- based  Planning.  Readings  in  Planning, 
1990. 

8.  F.  Hayes-Roth,  B.  Hayes-Roth.  A  cognitive  model  of  planning.  Readings  in  Plan¬ 
ning,  pages  245-262,  1990. 

9.  B.  Lopez  and  E.  Plaza.  Case-based  Learning  of  Strategic  Knowledge.  European 
Working  Session  on  Learning:LN  in  AI,  1991. 

10.  E.  Plaza.  Inference-level  reflection  in  NOOS.  IMSA  International  Workshop  on 
Reflection  and  Meta-level  Architectures,  1992. 

11.  J.  Rice.  The  Design  and  Implementation  of  poligon,  a  High-Performance,  Con¬ 
current  Blackboard  System  Shell,  report  STAN-CS-89-1294,  Stanford  University 
November  1989. 

12.  B.  Smith.  Reflection  and  semantics  in  lisp.  ACM,  pages  23-35,  1983. 

13.  L.  Sterling  and  E.  Shapiro.  The  Art  of  Prolog.  MIT  press,  1986. 

14.  A.  Tate.  Generating  Project  Networks.  Readings  in  Planning,  1990. 

15.  A.  Tate.  0-Plan2:  Choice  Ordering  Mechanisms  In  an  AI  Planning  Architecture”, 
Journal  =  ”Worhshop  on  innovative  approaches  to  planning,  scheduling  and  con¬ 
trol.  1990. 

16.  D.  E.  Wilkins.  Domain-independent  planning:  Representation  and  plan  genera¬ 
tion.  Readings  in  Planning,  1990. 


On  the  Learning  of  Rule  Uncertainties  and 
their  Integration  into  Probabilistic  Knowledge 

Bases 

Beat  Wiithrich 

ECRC  GmbH,  Arabelta$tr.  /7 
D-8000  Munich  SI,  Germany 
e-mail:  heat@ecrc.de 

Abstract.  We  present  a  natural  and  realistic  knowledge  acquisition  and  processing 
scenario.  In  the  first  phctse  a  domain  expert  identifies  deduction  rules  where  he  thinks 
that  they  are  good  indicators  of  a  specific  target  concept  to  occur.  Then,  in  a  second 
knowledge  acquisition  phase,  a  learning  algorithm  automatically  adjusts,  corrects  and 
optimizes  the  deterministic  rule  hypothesis  given  by  the  domain  expert  by  selecting 
an  appropriate  subset  of  the  rule  hypothesis  and  by  attaching  uncertainties  to  them. 
Finally,  in  the  running  phase  of  the  knowledge  bcise  we  can  arbitrarily  combine  the 
learned  uncertainties  of  the  rules  with  uncertain  factual  information. 

1  Introduction 

The  aim  of  this  study  is  twofold.  First,  we  show  how  the  learning  techniques  for 
propositional  concepts  given  by  [KS90]  can  be  used  to  learn  probabilities  of  not 
necessarily  propositional  deduction  rules.  Second,  we  show  how  to  integrate  or 
simulate  these  learned  probabilities  in  a  rule-based  calculus  which  deals  quan¬ 
titatively  with  vague  rule  premises  [Wut92b].  Together,  this  gives  a  knowledge 
representation  tool  for  uncertain  rule-based  reasoning. 

We  start  by  saying  why  we  think  that  a  combination  of  rule-based  reasoning 
and  probabilities  is  a  promising  and  needed  approach  to  capture  the  knowledge 
of  domain  experts  into  a  knowledge  base  system.  Humans  like  to  think  in  terms 
of  disjunctions  of  conjunction  (e.g.  see  [Val85,  p.560])  or  in  terms  of  rules  like 


has.cancer(x) 

i — 

person{x)  A  smoker{x) 

(1) 

has.cancer(x) 

person(x)  A  -'does-spc>r<(x) 

(2) 

has-cancer(x) 

4- 

person(x)  A  ancestor(y,  i)  A  has.cancer(y) 

(3) 

However,  in  many  situations  to  get  from  a  domain  expert  such  rules  is  not  the 
end  of  the  knowledge  acquisition  task  since  these  rules  do  not  express  precisely 
enough  the  reality.  For  instance,  the  rules  (1),(2)  and  (3)  are  only  true  to  a 
certain  degree.  This  raises  the  need  to  have  a  means  to  deal  with  “degreeness” 
of  truth.  There  are  two  sources  of  vagueness.  The  rule  (1)  itself  holds  only  to 
a  certain  degree,  and,  the  premise  smoker{x)  needed  to  deduce  the  conclusion 
has.cancer{x)  can  also  be  more  or  less  true.  For  instance,  smoker{hans)  :  0, 
smoker{hans)  :  0.5,  and  smoker{hans)  :  1  can  model  respectively  that  harts  is 
a  non-smoker,  smokes  one  package  of  cigarettes  a  day  and  smokes  more  than 


623 


two  packages  a  day.  So  in  many  situations  the  rule-based  paradigm  alone  is 
not  sufficient.  We  need  a  way  to  express  uncertainty.  Also,  a  combination  of 
rules  and  uncertainties  can  be  used  to  simulate  default  reeisoning  in  rule-based 
systems  [Ric83].  Note  that  an  uncertainty  can  be  interpreted  in  different  ways. 
The  f2ict  smoker(hans)  :  0.5  can  either  express  that  hans  is  a  medium  strong 
smoker  or,  that  he  is  a  smoker  or  a  non-smoker  but  we  have  absolutely  no  hint 
which  of  the  two  possibilities  is  the  right  one. 

There  are  basically  two  ways  of  dealing  with  vagueness,  either  quantitatively 
or  qualitatively.  We  will  adopt  the  former  one  and  try  to  formalize  and  handle  the 
knowledge  quantitatively.  However,  quantitative  methods  have  a  serious  draw¬ 
back  versus  qualitative  approaches  like  the  one  proposed  in  [HR83]  As  stated 
there,  the  main  problem  with  probabilistic  reasoning  is  that  people  are  unwilling 
to  give  precise  numerical  probabilities  to  outcomes.  Therefore,  Halpern  and  Ra¬ 
bin  proposed  in  the  former  study  a  qualitative,  propositional  logic  of  likelihood 
to  deal  with  the  intuitively  appealing  notion  of  “likely”  without  using  explicit 
numbers  or  probabilities.  Since  people  are  unwilling  to  give  precise  values  to 
outcomes,  it  is  exceedingly  important  to  show  how  to  eliminate  the  ne^d  that 
people  have  to  estimate  probabilities  of  deduction  rules;  especially  when  the  task 
is  also  complicated  by  the  fact  that  one  has  to  give  numerical  values  for  the  eight 
different  cases  that  has.cancer{x)  holds  if  only  the  rule  body  of  (1)  is  true,  if 
exactly  the  bodies  of  (1)  and  (2)  are  true,  and  so  on.  While  the  heuristic  esti¬ 
mation  of  probabilities  of  rules  seems  indeed  to  overtax  the  human  capabilities, 
an  appropriate  setting  of  the  truth  degree  of  individual  fcicts  or  ground  atoms 
seems  to  be  manageable  by  domain  experts.  For  instance,  to  attach  a  number  to 
exercises(hans,  soccer)  expressing  the  degree  of  truth  for  this  fact  is  not  more 
or  less  difficult  than  to  judge  whether  exercises(bans,  soccer)  necessarily  holds, 
does  not  necessarily  hold,  likely  holds,  possibly  holds  and  so  on  -  to  speak  in 
terms  of  the  language  introduced  in  [HR83]. 

We  will  overcome  the  problem  of  probability  estimation  by  giving  a  learn¬ 
ing  algorithm  for  probabilities  of  rules.  Furthermore  we  will  show  how  these 
numbers  can  be  dealt  with  in  a  rule-based  system  also  dealing  with  uncertain 
premises.  Together  this  results  in  a  framework  where  people  can  think  in  terms 
of  (not  necessarily  propositional)  deduction  rules  and  where  the  reasoning  is 
more  precise  than  in  the  framework  proposed  in  [IIR83].  Our  framework  al¬ 
lows  preciser  information  because  event  dependencies  cannot  only  be  “likely” 
or  “conceivable”  as  in  [HR83]  -  p  occurs  likely  or  conceivablely  respectively  if  q 
occurs  -  but  p  q  '■  "1  can  express  any  degree  of  dependence  between  “p  holds” 
if  “g  holds”.  The  degree  of  dependence  is  expressed  by  7  €  [0, 1].  The  use  of 
explicit  numbers  circumvents  also  a  serious  problem  occurring  in  Halpern ’s  and 
Rabin’s  likelihood  logic  [HR83,  p.318].  Let  us  interpret  “likely”  as  being  greater 
or  equal  to  0.5.  Now  consider  a  situation  where  we  toss  a  fair  coin  twice,  and  let 
p  represent  “the  coin  lands  heads  both  times” ,  while  q  represents  “the  coin  lands 
tails  both  times” .  Then  from  the  likelihood  logic  follows  that  since  p  V  9  is  likely 


624 


either  p  is  tikeiy  or  q  is  likely,  which  is  not  the  case.  On  the  other  hand,  using 
our  approach  this  problem  is  avoided.  In  terms  of  rules  and  facts,  the  former 

situation  is  p  *—  head(l)  A  heud(2),  q  i - >head[l)  A  -<htad{2),  head{l)  :  0.5  and 

head(2)  :  0.5.  Then  using  the  rule-base  calculus  proposed  in  [Wiit92b]  we  get 
the  probability  of  event  p  and  those  for  event  q  to  be  0.25,  the  probability  of 
p  V  g  to  be  0.5,  the  probability  of  both  events  occurring,  i.e.  p  Aq,  to  be  zero, 
and  the  probability  of  V  p,p  V  -<q,  -<q  and  also  -^p  to  be  0.75. 

Now  we  come  to  the  delicate  task  of  making  some  realistic  assumptions  on 
the  real  world.  Suppose  a  domain  expert  identifies  a  set  of  relevant  rules,  like  (1) 
to  (3)  in  order  to  define  when  a  concept  like  has. cancer {x)  occurs  for  a  particular 
person  x.  Then  we  distinguish  eight  cases:  either  none  of  the  three  rule  bodies  is 
true  for  this  person,  or  only  the  rule  body  of  (1)  is  true,  ...,  or  only  the  rule  bodies 
of  (1)  and  (2)  are  true,  or  all  three  rule  bodies  are  true.  In  either  case,  there 
is  a  specific  probability  for  this  person  having  cancer.  Hence,  in  this  example, 
we  assume  that  there  are  probabilities  71 , 72. 73>  7i2,  7i3.  723.  7i23  satisfying  the 
following.  Whenever  we  investigate  in  future  a  randomly  drawn  person  like  bans 
where  all  premises  like  execises{hans,  soccer)  or  STnoker(hans)  are  either  true 
or  false  then  has.cancer{hans)  has  probability  0  if  non  of  the  rule  bodies  of  (1) 
to  (3)  is  true  wrt  the  substitution  (x/hans);  it  has  probability  7]  if  exactly  rule 
body  (1)  is  true  wrt  {i/hans};  it  has  probability  723  if  precisely  the  body  of  (2) 
and  (3)  is  true  wrt  {x/hans};  and  so  on.  When  saying  that  for  instance  the  body 
of  (3)  is  true  wrt  {x/hans}  we  mean  that  the  existential  closure  of  (3){ar/hans} 
is  true,  i.e.  that  3y(person(hans)  A  ancestor (y,  hans)  A  has.cancer(y))  is  true. 
Note  that  the  assumption  of  the  existence  of  the  numbers  ^7  seems  indeed  to 
be  realistic  since  if  we  add  the  additional  rule  has.cancer(x)  <—  person(x)  then 
we  will  also  capture  those  persons  having  cancer  but  for  which  none  of  the  rule 
bodies  is  true.  However,  in  this  case  we  then  assume  the  existence  of  the  fifteen 
numbers  71,  ••■71234  with  their  straightforward  meaning. 

2  Preliminaries 

We  define  the  model  of  efficient  distribution-free  learning  of  probabilistic  concepts 
cis  it  was  introduced  in  [KS90].  This  model  is  a  natural  and  important  extension 
to  Valiant’s  probably  approximately  correct  learning  model  for  (deterministic) 
concepts  [Val84,  Nat91]. 

/,  probabilistic  concept,  abbreviated  by  p-concept,  over  a  domain  set  (or  in¬ 
stance  space)  X  is  simply  a  mapping  c  :  X  — >  [0, 1].  For  each  a  €  X,  we 
interpret  c(a)  cis  the  probability  that  a  is  a  positive  example  of  the  p-concept 
c.  A  learning  algorithm  is  attempting  to  infer  something  about  the  underlying 
target  p-concept  c  solely  on  the  basis  of  labeled  examples  (a,  6),  where  6  €  {0, 1} 
is  a  bit  generated  randomly  according  to  the  conditional  probability  c(n).  Thus, 
examples  are  of  the  form  (a,0)  or  (a,  1)  -  not  (a,  c(a)). 

A  p-concept  class  C  is  a  family  of  p-concepts.  On  any  execution,  a  learning 


625 


algorithm  is  attempting  to  learn  a  distinguished  target  p-concept  c  €  C  with 
respect  to  a  fixed  but  unknown  and  arbitrary  target  distribution  D  over  X.  The 
learning  algorithm  is  given  access  to  an  oracle  EX  that  behaves  as  follows:  EX 
first  draws  a  point  o  6  X  randomly  according  to  the  distribution  D.  Then  with 
probability  c(a),  EX  returns  the  labeled  example  (a,  1)  and  with  probability 
1  —  c(a)  it  returns  (a,0). 

Now  in  this  setting  we  are  interested  in  inferring  a  good  model  of  probability 
with  respect  to  the  target  distribution.  We  say  that  a  p-concept  h  is  an  i-good 
model  of  probability  of  c  with  respect  to  D  if  we  have  Pragij[|  /i(o)  —  c(a)  |<  r]  > 
1  —  e.  Hence  the  value  of  h  must  be  near  that  of  c  on  almost  all  points  a. 

Let  C  be  a  p-concept  class  over  domain  X .  We  say  that  C  is  leamable  with  a 
model  of  probability  if  there  is  an  algorithm  A  such  that  for  any  target  p>-concept 
c  €  C,  for  any  target  distribution  D  over  X,  for  any  inputs  f,<f  >  0,  algorithm 
A,  given  access  to  EX,  halts  and  with  probability  1  —  S  outputs  a  pnconcept  h 
that  is  an  c-good  model  of  probability  for  c  with  respect  to  D.  We  say  that  C  is 
polynomially  leamable  if  A  runs  in  time  poly{l/6, 1/f),  i.e.  in  time  polynomial 
in  1/e  and  l/S. 

We  need  an  additional  assumption  in  order  to  be  able  to  use  Kearn’s  and 
Schapire’s  learning  model  also  for  non-propositional  concepts.  Assume  to  have 
some  rule  bodies  di(x),  ...,dm(^)-  Then  oracle  EX  returns  not  only  a  randomly 
drawn  example  a  6  Xi  x  ...  x  X„  for  some  n  >  0  together  with  the  bit  b  £  {0, 1), 
but  the  oracle  will  also  tell  us  whether  d,(a)  for  1  <  i  <  m  is  true  or  not. 
The  closed  formula  d,(5),  a  =  ai,  ...,a„,  is  obtained  from  di(x),  x  =  Xj,  ...,x„, 
by  applying  the  ground  substitution  {xi/ai,...,x„/a„).  Thus,  the  oracle  EX 
returns  the  labeled  example  (a,  6i,  ...,6^,6),  where  b\,...,bm,l>  €  {1,0}.  This 
extension  implies  that  if  a  domain  expert  gives  an  indicator  di  (x)  then  someone 
(the  oracle  or  a  set  of  background  information  in  the  form  of  feicts  and  rules 
respectively)  has  to  be  able  to  tell  us  whether  d,{a)  is  true  or  not  for  some  point 
a  €  Xi  X  ...  X  X„.  So  if  we  draw  for  instance  bans  during  the  learning  phase  the 
oracle  decides  whether  hans  is  a  smoker  or  not.  However,  this  does  not  imply 
that  later  on  when  having  learned  the  concept  we  may  not  value  persons  of 
having  cancer  where  we  are  not  one  hundred  percent  sure  whether  these  persons 
are  now  non-smokers  or  smokers.  Or  in  other  words,  although  we  will  cope  later 
on  with  uncertain  premises,  we  assume  certain  premises  during  the  learning 
phase.  As  we  will  see,  the  truth  value  of  the  premises  is  not  estimated  directly 
but  indirectly  giving  numbers  to  ground  atoms  only.  The  assumption  that  the 
oracle  has  to  decide  during  the  learning  phase  whether  a  premise  is  true  or  not 
can  be  justified  ais  follows.  Whenever  we  have  a  training  example  and  we  (the 
oracle  or  background  information  in  form  of  rules  and  facts  respectively)  can  not 
decide  on  the  truth  of  the  premises  then  we  simply  skip  this  example  and  try 
another  one.  This  implies  that  if  we  would  need  poly(\/S,  l/t)  training  examples 
to  learn  a  particular  concept  and  if  we  can  decide  only  on  P  of  them  whether 
the  premises  are  satisfied  or  not  then  we  actually  have  to  draw  poly{\/6,  l/e)/0 


626 


training  examples.  Consequently,  we  will  learn  the  uncertainties  of  the  rules  on  a 
subset  X'  of  the  domain  X  and  assume  that  the  examples  from  X'  behave  as  the 
examples  from  X .  So  we  extrapolate  from  X'  to  X,  But  for  the  mathematical 
analysis  of  the  learning  algorithm,  we  will  assume  that  the  trainings  examples  are 
randomly  drawn  from  X  according  to  the  unknown  and  arbitrary  distribution 
D.  Note  that  we  do  not  allow  an  additional  length  parameter  like  the  number  of 
relevant  indicators  (e.g.  see  [Nat91,  section  2.1]).  The  reason  is  that  a  domain 
expert  can  realistically  be  asked  to  give  at  most  m  relevant  indicators  for  some 
fixed  m.  Moreover,  if  we  would  allow  this  additional  complexity  parameter 
then  the  class  of  concepts  we  are  particularly  interested  in  here  (vdp-concepts 
defined  below)  would  neither  be  efficiently  learnable  nor  could  these  concepts  be 
evaluated  in  polynomial  time.  Hence,  we  are  really  forced  to  impose  an  upper 
bound  on  the  number  of  potentially  relevant  indicators. 

We  discuss  syntactic  restrictions  on  the  rule  bodies.  We  tacitly  assume  that 
the  given  rule  bodies  di(x),  ...,dm{x)  are  function-free,  conjunctions  of  literals, 
all  having  the  same  free  variables  and  that  the  domain  Xi  x  ...  x  X„  from  which 
the  oracle  will  randomly  draw  the  values  for  z  —  x\, ...,  x„  is  explicitly  encoded 
in  each  rule  body  in  the  form  dom(xi)  A  ...  A  dom(x„),  i.e.  in  the  form  of 
an  additional  conjunction  of  atoms.  In  rule  (1)  to  (3)  this  conjunction  is  just 
person{x).  We  also  impose  that  the  rule  bodies  are  allowed  (each  head  variable 
occurs  in  a  positive  body  literal).  This  has  no  consequence  on  the  learning  itself 
but  ensures  good  properties  when  afterwards  combining  the  learned  probabilities 
with  a  special  rule-based  calculus. 

In  this  study  we  are  particularly  interested  in  the  class  C  of  visible  disjunc¬ 
tive  p-concepts  with  respect  to  {dj(i), ...,  d(„(x)}  -  abbreviated  by  vdp-concepts 
(wrt  {dj(x),  ...,d'^{x)}).  When  appropriate  we  write  di  instead  of  d,(x).  Each 
rule  body  d\  obeys  the  previously  defined  restrictions.  A  vdpnconcept  c  €  C  wrt 
{di(x),  ...,dj„(x)}  is  defined  by  a  set  of  n  rule  bodies  {di,  ...,dn}  C  {dj,  ...,d^} 
and  by  2"  -  1  real  numbers  7,,  (1  <  M  <  n),  y,,,.,  (1  <  n  <  12  <  n),  7i,iji3 ,  (1  < 
*1  <  »2  <  »3  <  «),  ...7ii,3.  i„,  (1  <  »i  <  »2-  <  in  <  n).  Each  real  value  tj-  is  in 
the  interval  [0, 1].  We  use  the  convention  77  =  77'  if  i  is  a  permutation  of  the  in¬ 
dices  t.  For  a  point  d  G  X,  we  have  c(d)  =  712.  fc,  where  {di(d),  d2(d),  ...,dfc(d)} 
is  the  maximal  subset  of  {di(a),  ...,d„(a)}  such  that  each  di(5)  for  I  <  i  <  k  is 
true.  Otherwise,  if  no  disjunct  is  true  wrt  a  then  c(o)  =  0. 

3  Learning  Rule  Uncertainties 

Proposition  1  Visible  disjunctive  probabilistic  concepts  wrt  {dj, ...,  dl„}  are  poly- 
nomially  learnable. 

Proof.  The  learning  algorithm  taking  as  input  <J  and  e  makes  direct  use  of  a 
technique  suggested  in  [KS90,  p.386]  and  performs  in  two  steps.  First,  for  each  of 
the  2"*  possible  hypothesis  h  =  {dj,  ...,d„}  wrt  {dj,  ...,dj„}  we  do  the  following. 


627 


We  estimate  -Vi  j  as  the  frequency  with  which  the  target  concept  occurs  in  a 
randomly  drawn  new  sample  of  size  poly(l/S,  \/t)  if  di  A ...  Ad/  A-idj+i  A ...  A->d„ 
is  true.  If  this  disjunct  is  never  true  then  we  can  set  7/ .,/  to  any  real  number 
in  [0, 1].  This  yields  an  estimation  h  for  each  of  the  2"*  hypothesis  h.  Second, 
for  each  of  the  2"*  hypothesis  h  we  estimate  its  error  using  the  quadratic  loss 
method  proposed  in  [KS90,  p.386]  and  choose  the  best  possible  hypothesis.  This 
is  done  using  the  function  for  a  hypothesis  h  which  is  specified  for  an  example 
(r,  y)  as  Qf^{x,y)  —  (h(x)  —  y)^ .  For  each  individual  h  we  draw  poly{l/S,  1/f) 
many  new  examples  (xi,  yi),  ...(x„,y„)  randomly  from  X  and  estimate  with  this 
sample  the  expected  loss  E[Qf^]  as  E[Qj^  =  1/n  ♦  (Q^(xi,  yi)  +  ...  +  C?^(x„, y„)). 
Then  we  output  as  f-good  model  of  probability  a  hypothesis  h  with  minimal 
empirical  loss  The  prove  that  the  algorithm  performs  as  claimed  is  given 

in  (Wiit92a].  0 

4  Integrating  vdp-Concepts 

We  show  how  a  learned  vdp-concept  can  be  handled  by  the  rule-base  calculus 
[Wut92b].  We  briefly  introduce  the  main  characteristics  of  this  calculus  first.  Let 
KB  =  {F,  R)  be  a  knowledge  base  consisting  of  a  finite  set  of  facts  or  ground 
atoms  F,  each  having  attached  a  probability  in  the  closed  real  interval  between 
zero  and  one,  and  a  finite  set  of  stratified  deduction  rules  R.  Then  the  mentioned 
calculus  computes  for  each  ground  formula  a  probability  which  reflects  whether 
it  is  true  or  false  with  respect  to  the  given  facts  and  rules.  For  example,  on  the 
rules 

has-cancer(x)  <—  person{x)  A  anc€stor{y,  x)  A  has-cancer(y) 
ancestor{x,y)  <—  parent{x,y) 
ancestor(x,y)  r—  par ent{x,z),  ancestor (z,y) 

and  the  facts  {paren<(d,c)  :  l,poren/(c,6)  ;  Q .S,  has.cancer(d)  :  0.7, per son(d), 
person(c),person(b)  the  calculus  computes  for  has.cancer(b)  a  probability  of 
0.56,  denoted  =  0  56,  and  for  ancestor(d,  b)  a  probability  of 

0.8.  Note  that  if  for  some  information  there  is  not  given  an  explicit  uncertainty, 
like  for  person(d)  or  the  rules,  then  this  information  is  taken  to  be  sure.  If 
we  interpret  each  ground  atom  as  denoting  a  special  ground  formula  (under 
the  minimal  model  semantics  of  [ABW88j,  has.cancer(b)  denotes  person(b)  A 
parent(d,  c)  A  par€nt{c,  b)  A  kasjcancer{d)  in  our  example)  then  for  this  calculus 
the  following  properties  can  be  shown: 

1  Ikb(p)  <  FKB(g)  if  P  logically  implies  q. 

2.  Ikb(p)  >  0 

3.  Ikb(p'^  =  Fkb(p)  +Fkb(9)  if  p  and  g  are  mutually  exclusive  i.e.  pAq 
is  unsatisfiable. 


628 


4  Tueip)  =  1  if  p  is  the  certain  event,  i.e.  a  tautology. 

Thus,  on  the  boolean  algebra  ([5),A,V),  where  S  is  a  set  of  ground  formulas 
and  [5]  is  5  modulo  logical  equivalence,  we  have  that  the  mapping  Ikb  is  well 
defined,  i.e.  T}(b(p)  =  IkbIq)  if  P  and  that  I/cs  defines  a  probability 

function  in  the  sense  of  axiomatic  probability  theory.  Thus,  all  the  usual  conse¬ 
quences  of  probability  theory  also  hold  here  [ParfiO,  pp. 18-22]  when  replacing  the 
relation  C  between  events  by  Moreover,  from  property  1  we  have  that  the 
calculus  is  monoton  in  the  number  of  derivations  for  a  particular  ground  atom. 
For  instance,  if  has.cancer{b)  could  also  be  deduced  by  person(b)  A  smokes{b) 
then  the  probability  of  has.cancer(b)  increases.  In  [Wut92b]  is  also  shown  how 
to  approximate  the  function  Ikb  in  time  polynomial  in  the  number  of  the  facts  F 
and  that  this  calculus  reduces  to  the  usual  minimal  model  Mp^jn  of  the  stratified 
Catalog  program  F  U  (see  [ABW88])  if  each  fact  in  F  is  sure. 

We  now  outline  how  to  simulate  a  learned  vdp-concept  c  in  the  calculus 
introduced.  For  a  vdp-concept  defined  by  n  rules  we  solve  a  system  of  2"  ~  1 
linear  equalities  in  2"  —  1  unknowns  xj.  In  our  example,  we  solve 

71  =  X1  *  Xi2  ♦  Xi3  ♦  Xi23 

72  =  *2  ♦  *12  ♦  *23  *  Xl23 

73  =  X3  *  Xi3  4  X23  *  *123 

712  =  71  +  72  -  *1  ♦  *2  *  *12  *  *23  *  *13  ♦  *123 

713  =  71  +  73  -  *1  ♦  *3  *  *12  ♦  *23  ♦  *13  ♦  *123 

723  =  72  +  73  —  *2  *  *3  *  *12  ♦  *23  ♦  *13  +  *123 

7123  =  *1  ♦  *12  ♦  *13  *  *123  +  *2  *  *12  ♦  *23  *  *123  +  *3  ♦  *13  ♦  *23  *  *123  “ 

Xl  ♦  X2  ♦  Xi2  ♦  X23  *  *13  ♦  *123  -  *1  ♦  *3  ♦  *12  ♦  *23  ♦  *13  ♦  *123  “ 

*2  ♦  *3  *  *12  ♦  *23  ♦  *13  *  *123  d-  *1  ♦  *2  ♦  *3  *  *12  *  *23  ♦  *13  *  *123 

Note  that  if  77  =  0  then  we  approximate  this  by  setting  77  to  an  arbitrary  small 
but  non  zero  value.  These  equations  have  exactly  one  solution  provided  they  are 
friendly  defined,  i.e.  no  divisor  happens  to  be  zero  and  non  of  the  77  is  zero.  But 
clearly  also  the  case  that  a  divisor  is  zero  can  be  approximated  arbitrarily  precise 
by  replacing  it  by  any  value  greater  than  zero.  Now  ^lssume  that  these  equations 
have  been  solved  yielding  in  our  example  *1  =  ^  _ 

(7i-r73-7i3)«(7i+-7a-7i3)  _  7i*7a*73«(7i+7a+73-7ia-7i3 — ra3+7ia3)  j 

7l*(7l +73+73-713 rj3 -733+7133)  ’  (71  +73  - '.1  a  )•  (7 1  +73 r  1 3 ) •  (73  +73 133  ) 

SO  on,  if  each  left  hand  side  is  non  zero.  Then  we  generate  the  rules 

has.cancer(x]  e-  p<fr(*)  A  smc>Arer(x)  A  Ui  A  Oj2  A  Ujs  A  0223 
has.cancer(x)  +-  per(x)  A -'does-sport(x)  A  02  A  ai2  A  023  A  <1123 
has.cancer{x)  +-  per(x)  A  nnc(y,  x)  A  has.cancer{y)  A  03  A  013  A  <123  A  a223 

and  add  the  new  facts  {ai  ;  *1,02  :  *2,03  :  *3>ai2  ■  *i2,ai3  ^  *13.«23  *23i«i23 
*123}  We  will  here  not  write  down  how  an  arbitrary  vdp-concept  is  simulated 


629 


but  this  as  well  cis  the  resulting  values  for  the  xj  are  easily  generalized  from 
the  above  example.  Of  more  interest  is  the  question  whether  the  presented 
simulation  technique  is  correct  or  not  (see  (Wiit92a]  for  the  proofs  of  proposition 
2  and  3). 

Proposition  2  (correctness  of  the  simulation  of  vdp-concepts  for  certain  rule 
premises)  Let  c  be  a  vdp-concept  defined  by  {di,...,dn}  and  7i,  m  such 
that  the  rules  R  =  {c  ■(—  di,...,c  *—  d„]  are  stratified,  and  the  corresponding 
system  of  2"  —  \  equalities  for  the  x-r  is  friendly  defined.  Let  R!  be  the  set  of 
rules  {c  •<—  di  An  1...  A  ai  c  A  a„.  Aoi  F'  be  the  set  of  facts  {oi  : 

3^1.  ••■1  ai...n  :  and  I  be  a  Herbrand  interpretation  (which  can  be  regarded 

as  a  set  of  facts  each  with  attached  probability  one)  defining  whether  a  disjunct 
di  *s  true  or  not  wrt  an  example  d.  Then  for  each  point  d  and  each  interpretation 
I,  if  R  is  non-recursive  or  I  =  Miur  —  {^(o)),  then  =  'll  k, 

where  {di(d),  ...,iifc(5')}  is  the  maximal  subset  of  {di{d),  ..,,d„(a)}  which  is  true 
wrt  I. 

5  Combining  Rule  and  Premise  Uncertainties 

We  show  now  what  happens  if  a  vdp-concept  is  evaluated  under  uncertain 
premises.  Let  us  take  the  rule 

has^cancer(x)  <—  person(x)  A  ancestor(y,  x)  A  has.cancer{y)  (4) 

where  we  know  that  it  is  true  in  O.I  percent  of  the  cases.  Furthermore,  we  have 
five  fact  sets 

Fi  =  {ancestor(frit2,  hans),  ancestor(heiri,  hans), 
has-cancer{ fritz)  :  Q.S,  hasj:ancer(heiri)  :  0.3} 

Fj  =  {ancestor{fritz,hans),has-cancer{fritz):O.S} 

F3  =  {ancestor(heiri,  hans),  has.cancer{heiri)  :  0.2) 

F4  =  {ancestor  (fritz,  hans),  anceslor(h€iri,  hans),  has. cancer  {fritz)  :  0.8} 
F5  =  {ancestor{heiri,  hans),  anctstor{fritz,  hans),  has. cancer{heiri)  0.2} 
Using  our  concepts  introduced  we  get 

0.08  =  {(4)})(Aas.cancer(Aans))  =  Z(/’,_{(4)})(/ias_cancer(/ions)) 

0.02  =  X^FJ^^4)])(has.cance^{hans))  =  Z(F,^^{(4)])(has-cancer(hans)) 

0.084  =  {(4)})(/ias_cancer(/ians)) 

Proposition  3  gives  the  precise  information  on  what  actually  happens  in  the  non- 
recursive  case  and  assures  the  correctness  of  the  combination  of  rule  uncertainties 
with  vague  premises  under  the  assumption  that  the  rule  uncertainties  are  inde¬ 
pendent  from  the  premise  uncertainties.  We  write  Pr[.A]  instead  of  I(f  «)(>!) 
since  it  is  clear  to  which  facts  and  rules  we  refer. 


630 


Proposition  3  (correctness  of  the  stmulatton  of  non-recurstve  vdp-concepts  un¬ 
der  vague  premtses)  Let  {F,R)  be  a  knowledge  base  where  predicate  c  does  not 
occur  and  let  c  be  a  vdp-concept  defined  by  the  disjuncts  {di,  ...,d„}  and  the 
friendly  real  numbers  7i,  .  n-  Then  for  any  point  d,  we  have: 

f’r[c(a)]  =  Pr[<fi(a)  A -i<i2(a)  A  ...  A -></„(a)]  *  7i 

+  Pr[-idi(a)  A  ^2(5)  A  ^<^3(5)  A  ...  A  -'dn(5)]  *  72 
+  ... 

+  Pr[-.<fi(a)  A  d2{d)  A  ->^3(5)  A  ^4(5)  A  -xfsia)  A  ...  A  --rf„(a)]  ♦  724 
+  ... 

+  Pr[di(d)  A  A  t<„(a)]  ♦  71  „ 
and  therefore  0  <  Pr[c(a)]  <  1. 

The  conclusion  of  proposition  3  can  be  seen  as  the  formula  of  the  total  proba¬ 
bility.  If  V  .  ..  V  i4„  is  the  certain  event  then,  according  to  probability  theory, 
Pr[B]  =  Pr[j4i]  ♦  Pr[B  \  Ai\  holds.  So,  since  the  7’s  are  the  conditional 

probabilities  Fr[B  j  /!,]  and  since  the  conditional  probability  Pr[c  j  A,->di]  is  by 
definition  zero,  the  stated  fact  follows. 

The  presented  simulation  technique  can  also  be  used  to  express  other  non- 
obvious  cases.  Assume  to  have  three  mutually  exclusive  events  6] ,  62  and  63 
with  probability  c/(,,  ,c/(,.j  and  c/63  respectively.  It  is  now  no  longer  obvious 
how  to  express  mutual  exclusiveness  as  for  example  when  having  two  mutually 
exclusive  events  (see  our  example  “we  toss  a  fair  coin  twice”  given  in  section 
1).  We  set  Pi  4—  bi-ib2~'b3a\,  ...,p3  4—  -'6i->6263a3  and  attach  for  instance  to 
ai  the  uncertainty  1/((1  —  c/63)  *  (^  ~  This  assures  the  desired  effect 

that  Pr[pi]  =  Pr[6i]  =  eft,, ,  Pr[p,  A  pj]  =  0,  Pr[p,  V  pj]  =  Pr[pi]  +  Pr[pj]  and 
Pr[pi  V  p2  V  ps]  =  c/6,  c/63  +  for  1  <  *  <  ji  <  3.  So  we  then  look  at  p, 

instead  of  bi  which  results  in  the  desired  behavior. 

6  Conclusions 

This  study  combines  results  from  three  dilTerent  areas:  probability  theory,  rule- 
based  reasoning  and  machine  learning.  In  particular,  we  presented  a  natural  and 
realistic  knowledge  acquisition  and  processing  scenario.  A  domain  expert  gives 
a  couple  of  hopefully  strong  indicators  of  whether  a  target  concept  will  occur. 

A  learning  algorithm  then  selects  a  subset  of  these  rules  and  attaches  weights 
to  them  such  that  the  concept  will  be  predicted  probably  optimally  within  the 
bounds  of  the  original  rules.  Although  we  do  not  make  any  assumption  on  the 
correctness  of  the  domain  expert’s  specification,  it  is  clear  that  the  better  the 
original  rules  are  the  better  results  the  learning  algorithm  can  produce.  Finally, 
we  integrate  the  learned  concepts  into  probabilistic  knowledge  bases  where  we 


631 


can  also  give  the  probability  of  a  concept  to  occur  even  when  the  rule  premises 
are  vague.  Moreover,  different  learned  concepts  and  non-learned  deterministic- 
rules  can  be  added  together  yielding  a  large  uniform  knowledge  base 

We  believe  that  a  prerequisite  for  a  successful  real  world  application  of  un¬ 
certain  information  in  knowledge  bases  is  to  have  available  a  “good”  uncertainty 
calculus  as  well  as  “good”  uncertainties  themselves.  We  hope  we  have  satisfied 
both  prerequisites. 


References 

[ABW88]  K.  R.  Apt,  H.  A.  Blair,  and  A.  Walker.  Towards  a  Theory  of  Declar¬ 
ative  Knowledge.  In  J.  Minker,  editor.  Foundations  of  Deductive 
Databases  and  Logic  Programming,  chapter  2,  pages  89-148.  Morgan 
Kaufmann  Publishers,  1988. 

[HR83]  J.  Y.  Halpern  and  M.  O.  Rabin.  A  Logic  to  Reason  About  Likelihood. 

In  Proc.  Annual  ACM  Symp  on  the  Theory  of  Computing,  pages  310- 
319,  1983. 

[KS90]  M.  J.  Kearns  and  R.  E.  Schapire.  Efficient  Distribution-free  Learning 
of  Probabilistic  Concepts.  In  Proc  31st  IEEE  Symposium  on  Founda¬ 
tions  of  Computer  Science,  pages  382-391,  1990. 

[Nat91]  B.  K.  Natarajan.  Machine  Learning.  A  Theoretical  Approach.  Morgan 
Kaufmann,  1991. 

[ParGO]  E.  Parzen.  Modern  Probability  Theory  and  its  Applications.  John 
Wiley  &  Sons,  Inc.,  1960. 

[Ric83]  E.  Rich.  Default  Reasoning  as  Likelyhood  Reasoning.  In  Proc.  Na¬ 
tional  Confon  Artificial  Intelligence  (AAAI-83),  pages  348-351, 1983. 

[Val84]  L.  G.  Valiant.  A  Theory  of  the  Learnable.  Communications  of  the 
ACM,  27(11):1134-1142,  1984. 

[Val85]  L.  G.  Valiant.  Learning  Disjunctions  of  Conjunctions.  In  Proc.  of  the 
9th  International  Joint  Conference  on  Artificial  Intelligence  (IJCAI), 
pages  560-566,  1985. 

[Wut92a]  B.  Wiithrich.  On  the  Efficient  Distribution-free  Learning  of  Rule  Un¬ 
certainties  and  their  Integration  into  Probabilistic  Knowledge  Bases. 
Technical  Report  92-28,  ECRC,  1992. 

[Wut92b]  B.  Wiithrich.  Probabilistic  Knowledge  Bases,  submitted  to  the  Jour¬ 
nal  of  Logic  Programming,  1992.  extension  of  Towards  Probabilistic 
Knowledge  Bases  in  Springer  Verlag,  Lecture  Notes  in  AI,  Nr  624. 


Recognition  of  Functional  Dependencies  in 

Data 


Robert  Zembowicz  and  Jan  M.  Zytkow 
Department  of  Computer  Science,  Wichita  State  University,  Wichita,  KS  67208 

Abstract.  Discovery  of  regularities  in  data  involves  search  in  many 
spaces,  for  instance  in  the  space  of  functional  expressions.  If  data  do 
not  fit  any  solution  in  a  particular  space,  much  time  could  be  saved  if 
that  space  was  not  searched  at  all.  A  test  which  determines  the  exis¬ 
tence  of  a  solution  in  a  particular  space,  if  available,  can  prevent  un¬ 
needed  search.  We  discuss  a  functionality  test,  which  distinguishes  data 
satisfying  the  functional  dependence  definition.  The  test  is  general  and 
computationally  simple.  It  permits  error  in  data,  limited  number  of  out¬ 
liers,  and  background  noise.  We  show,  how  our  functionality  test  works 
in  database  exploration  within  the  49er  system  as  a  trigger  for  the  com¬ 
putationally  expensive  search  in  the  space  of  equations.  Results  of  tests 
show  the  savings  coming  from  application  of  the  test.  Finally,  we  discuss 
how  the  functionality  test  can  be  used  to  recognize  multifunctions. 


1  Introduction:  the  Role  of  Application  Conditions 

Machine  Discovery  systems  such  as  BACON  [4],  FAHRENHEIT  [11],  KDW  [7], 
EXPLORA  [2,3],  FORTY-NINER  (49er)  [9,10],  and  many  others,  explore  data 
in  search  for  hidden  knowledge.  They  conduct  search  in  usually  large  spaces  of 
hypotheses.  More  discoveries  can  be  made  when  a  hypotheses  space  is  larger,  but 
search  in  a  large  space  takes  plenty  of  resources:  time  and  memory.  Using  49er  to 
explore  a  database,  for  instance,  even  if  a  single  search  for  an  equation  may  take 
few  seconds,  when  repeated  thousands  of  times  it  can  consume  very  significant 
resources.  Can  unsuccessful  search  be  avoided?  What  tests  can  be  applied  to 
determine  whether  there  is  a  chance  for  a  solution  in  a  particular  space? 

Another  reason  to  avoid  the  search  is  when  results  would  not  have  meaning. 
Different  types  of  regularities  make  sense  for  different  types  of  variables.  For 
instance,  even  if  two  variables  have  numerical  values,  if  the  numbers  are  on  the 
nominal  scale  rather  than  on  interval  or  ratio  scales,  the  discovered  equations 
are  not  meaningful.  Similarly  for  a  nominal  attribute  it  does  not  make  sense  to 
aggregate  the  values  in  two  or  more  classes  classes  to  summarize  data  in  simple 
contingency  tables,  because  in  almost  all  cases  the  apriori  conducted  grouping 
has  little  sense.  Simple  conditions,  which  do  not  even  consider  the  data,  but  only 
domain  knowledge  about  the  type  of  variables,  can  be  sufficient.  Zembowicz  and 
Zytkow  [9]  summarize  the  dependence  between  variable  type  and  regularity  type 
for  the  use  of  49er.  All  application  tests  are  evaluated  before  the  corresponding 
space  is  searched,  so  that  the  number  of  spurious  regularities  decreases  and/or 
the  efficiency  of  search  is  improved. 


In  this  article  we  will  concentrate  on  the  functionality  test  that  prevents  an 
unsuccessful  but  costly  search  in  the  space  of  functional  expressions.  Rather  than 
considering  only  types  of  variables,  our  test  analyzes  the  actual  data. 

2  Testing  Functionality 

Empirical  equations  are  efficient  summaries  of  data,  if  the  data  distribution  is 
close  to  a  functional  relationship.  In  many  machine  discovery  systems,  equations 
form  large,  potentially  unbound  search  spaces  [1,4, 5, 6, 8, 9].  The  search  in  the 
space  of  equations  is  expensive,  and  it  may  be  repeated  many  times  for  different 
combinations  of  variables  and  different  ranges  of  data.  It  would  be  a  good  idea  to 
avoid  search  on  the  data  which  are  not  conducive  to  functional  description.  The 
problem  occurs  typically  in  database  discovery,  where  most  of  data  follow  weak 
statistical  patterns  rather  than  strong  functional  dependencies,  but  still  some 
data  can  be  described  by  equations.  Many  discovery  systems  assume  functional 
relationships  in  data,  and  focus  on  identifying  the  best  functional  expression. 
This  assumption  is  justified  if  the  data  come  from  well  designed,  high  quality 
experiments  in  physics  or  chemistry,  but  is  seldom  satisfied  in  most  other  cases. 
However,  when  strong  functional  relationships  are  possible,  it  would  be  a  big 
fault  not  to  notice  them  when  they  occur.  We  need  a  test  which  applies  to  all 
datasets  and  is  passed  only  when  the  data  can  be  approximated  by  a  function. 

We  will  present  such  a  test,  which  determines  whether  there  is  a  functional 
dependency  between  two  given  variables;  independent  (called  jr)  and  dependent 
(y).  If  the  dependent  variable  cannot  be  identified,  the  test  should  be  run  twice, 
first  for  y  as  dependent  and  then  for  x  as  dependent. 

2.1  The  DeHiiition  of  Functionality 

We  will  use  the  following  mathematical  definition  of  functional  relationship: 

Definition:  Given  a  set  DB  of  pairs  ixi,yi),  i  =  1, . . . ,  of  two  variables  x  and 
y,  and  the  range  A'  of  x,  y  is  a  function  of  x  iff  for  each  xq  in  A',  there  is  exactly 
one  value  of  y,  say  yo,  such  that  (xo,yo)  is  in  DB. 

We  could  add  a  second  requirement:  y  should  be  a  continuous  function  of 
X.  However,  discrete  data  make  testing  continuity  difficult  and  missing  data 
frequently  results  in  artificial  discontinuities.  After  conducting  many  experiments 
we  decided  not  to  use  analysis  of  continuity  in  our  functionality  test.  The  issue 
of  continuity  is  re-visited  in  the  section  on  multiple  equation  finder. 

2.2  Problems  with  Functionality  in  Real  Data 

It  would  be  easy  to  check  functionality  in  a  given  dataset  by  the  strict  appli¬ 
cation  of  the  definition,  that  is,  by  checking  whether  indeed  a  unique  value  of 
X  corresponds  to  one  value  of  y.  However,  in  real  world  data,  because  of  phe¬ 
nomena  collectively  called  error  and  noise,  strict  uniqueness  does  not  occur  and 


634 


therefore  the  condition  is  unrealistic.  We  want  a  more  relaxed  test  that  will  ad¬ 
mit  data  sets  which  can  be  reasonably  approximated  by  equations.  In  particular, 
we  want  to  admit  a  deviation  in  values,  so  rather  than  require  a  single  value  of  y 
permit  a  distribution  of  values  in  a  relatively  small  interval.  This  fits  a  statistical 
model  y  =  f(x)  -(-/)(r),  where  f(x)  describes  relationship  between  x  and  y,  fii r) 
incorporates  effects  of  phenomenas  like  error  or  noise. 

Figure  la  illustrates  how  error  complicates  the  problem;  for  the  same  value  of 
X  there  could  be  many  different  values  of  y.  A  test  strictly  based  on  the  definition 
would  frequently  fail  even  if  the  data  could  be  described  by  a  function  within  a 
small  deviation  of  Y .  If  the  error  6y  of  y  is  known,  the  comparison  of  values  of 
y  should  be  related  to  the  err^'r.  For  example,  one  can  compare  |j/i  —  t/2l  versus 
error  of  y,  ).  It  means  that  the  definition  of  the  functionality 
should  be  modified  for  our  purpose:  for  each  value  of  x,  differences  between  the 
corresponding  values  of  y  must  be  in  the  range  (—Sy ,  6y ).  However,  in  many  cases 
error  is  unknown.  Error  is  typically  estimated  only  for  experimental  data. 

Fig.l.  (a)  Effect  of  eiror;  there  is  many  points  that  have  the  same  value  of  r  but 
different  y  values.  Because  these  data  coe'  be  described  by  a  (linear)  function  plus  a 
small  error,  we  need  a  more  permissive  definition  of  functionality,  (b)  Effect  of  noise: 
in  addition  to  data  that  lollow  a  functional  dependency,  additional  distribution  of 
points  is  not  correleated  with  that  dependency.  These  additional  points  clearly  violate 
the  definition  of  functionality,  even  if  aolerance  ♦<'  small  error  is  introduced,  because 
in  many  cases  the  differences  between  the  values  of  y  for  one  value  of  x  are  large). 
However,  a  linear  dependency  in  data  still  can  be  easily  noticed. 


Another  important  factor  is  background  noise,  which  comes  from  many  sources 
and  is  common  in  databases.  Even  if  all  combinations  of  values  are  present  in  the 
data  due  to  the  noise,  the  functional  relationship  may  still  stand  out  by  higher 
concentration  of  data.  Figure  lb  shows  background  noise  addet,  to  a  functional 
dependency:  in  addition  to  data  that  can  be  well  described  by  a  linear  function, 
a  large  number  of  points  is  randomly  distributed. 

Possibility  of  missing  data  is  another  factor  that  adds  to  the  complexity  of 
the  problem.  For  some  values  of  x,  data  on  the  functional  dependence  may  not  be 


635 


present.  This  is  common  in  databases  because  in  many  cases  there  is  no  control 
over  data  collection  process. 

Finally,  we  want  to  tolerate  a  small  number  of  outliers,  that  is  data  which 
stand  out  from  noise  and  lie  drastically  outside  the  error  distribution.  Such  data 
could  be  entered  by  error,  or  they  represent  a  phenomenon  of  potential  interest. 
Still  we  wish  to  neglect  those  data  if  they  are  too  few  to  make  sense  of  them. 


2.3  Determination  of  the  Uniqueness 

In  databases,  typically  a  small  discrete  set  of  values  is  permitted  for  each  at¬ 
tribute.  Given  a  small  set  X  of  x  values,  and  a  small  set  T  of  y  values,  the 
product  X  X  y  is  computationally  manageable,  as  well  as  the  corresponding 
frequency  table  F  which  is  a  mapping  F  :  X  x  Y  N,  where  N  is  the  set  of 
natural  numbers,  and  F(xo,yQ)  =  n  when  n  is  the  number  of  datapoints  with 
X  =  xq  and  y  =  j/q.  If  the  number  of  values  of  x  and/or  y  becomes  too  large 
to  compute  the  frequency  table,  the  values  of  x  and  y  can  be  grouped  into  bins 
b{X)  and  b(Y),  respectively. 

Aggregating  the  values  of  x  into  bins  6(A)  of  equal  size  means  that  the 
point  (xoi  J/o)  is  replaced  by  a  pair  of  integer  numbers  {kr,  ky)  such  that  Xo  is  in 
the  range  from  x^in  +  k^Ax  to  Xmtn  +  (i^r  +  f)Ax,  and  yo  is  in  the  range  from 
ymin  +  kyAy  to  ymin  +  {ky  +  l)^y,  where  x^in  and  ymin  are  the  smallest  values 
of  X  and  y,  respectively.  Note  that  the  frequency  table  F  can  be  interpreted  as 
a  grid  6(A')  x  6(y)  imposed  on  the  data  (x,y)  and  defined  by  Ax  and  Ay. 

Binning  the  original  variables  x  and  y  into  bins  6(A)  and  6(y)  is  very  helpful 
for  determining  functionality  in  data  which  include  error  and/or  noise.  If  the  grid 
size  Ay  is  comparable  to  the  error  by,  the  requirement  for  diffences  between  y 
values  corresponding  to  the  same  value  of  x  can  be  replaced  by  the  following:  ail 
points  lying  in  the  same  x-bin  k^  must  lie  in  the  adjacent  y-hins  (for  example, 
in  bins  ky  —  \,  ky,  ky  +  1).  Note  that  the  problem  of  error  is  removed  only  if 
the  bin  sizes  Ax  and  Ay  are  not  smaller  than  corresponding  errors;  otherwise 
it  may  still  happen  that  the  functionality  test  fails  because  points  in  the  same 
x-bin  may  not  lie  in  adjecent  y-bins,  even  if  the  original  data  really  follow  a 
functional  dependency  plus  error.  On  the  other  hand,  if  the  sizes  Ax  and  Ay 
are  too  large,  the  test  could  assign  functionality  to  data  that  intuitively  should 
not  be  described  by  a  single  function.  In  the  extreme  case,  when  Ay  is  larger 
than  ymax  —  ymin,  all  points  always  lie  in  the  same  y-bin,  because  one  includes 
all  values  of  y. 

The  problem  of  background  noise  can  be  at  least  alleviated  if  instead  of 
checking  adjacency  of  non-empty  cells  F{kx,ky)  which  have  the  same  value  kz, 
one  considers  only  cells  that  contain  an  above  average  number  of  points.  This 
noise-subtraction  works  effectively  only  if  the  background  noise  is  not  stronger 
than  the  regularity  itself. 

If  the  errors  bz  and  by  are  known,  they  can  be  used  as  the  corresponding  bin 
sizes  Ax  and  Ay.  But  if  they  are  unknown  or  uncertain,  the  proper  grid  sizes 
must  be  estimated. 


636 


Before  we  present  the  algorithm  which  estimates  the  grid  sizes,  let  us  consider 
an  ideal  functional  dependence,  say  a  straight  line,  as  in  Figure  2a:  the  data 
are  evenly  distributed  with  constant  distance  A  in  x  between  points.  Let  us 
define  the  “density”  p  as  the  average  number  of  points  in  all  cells  which  contain 
at  least  one  point.  As  long  as  the  grid  size  Ax  is  smaller  than  A,  p  is  equal 
to  one.  The  density  p  starts  to  grow  when  Ax  becomes  greater  than  A.  Note 
that  as  opposed  to  the  true  density  equal  to  the  number  of  data  points  divided 
by  the  total  number  of  cells,  p  does  not  depend  on  the  number  of  x-bins  in 
the  following  sense.  If  we  extend  the  range  of  x  by  adding  more  points  which 
are  evenly  distributed,  and  keeping  Ax  constant,  then  in  our  ideal  example,  p 
will  have  exactly  the  same  values.  This  is  important  because  it  means  that  the 
“density”  measure  p  does  not  depend  on  the  range  of  x  or  y  as  long  as  bin  sizes 
are  the  same.  Therefore  p  can  be  used  to  determine  Ax  and  Ay. 

Determination  of  the  Grid  Size.  Our  algorithm  for  finding  the  grid  size 
starts  from  some  small  initial  sizes  Ax  and  Ay  and  then  changes  these  sizes 
until  a  criterion  for  density  of  points  is  satisfied. 

Algorithm:  Determine  grid  size 
Zix  -  A/2iVpo,  Ay  Y/2Npo 
p  *—  A’  /  (#  of  non-empty  cells) 
if  p  <  Po  then 
repeat 

Ax  <—  2zix,  Ay  <—  2Ay 
p  *—  N  /  of  non-empty  cells) 
until  p  >  Po 
else 

repeat 

Z\x  <—  Ax/2,  Ay  Ay/2 
p  *—  N  /  (i/  of  non-empty  cells) 
until  p  <  Po 

Ax  «—  2zlx,  Ay  *-  2Ay 

end  if 

end  algorithm 

Po  is  the  minimum  required  “density”  of  points  in  non-empty  cells.  The  initial 
values  of  Ax  and  Ay  are  chosen  to  be  X/2Npo  and  Y/2N po,  respectively,  because 
for  monotonic  functions  with  more  or  less  evenly  distributed  points  the  resulting 
density  p  would  be  close  to  po  -  The  additional  factor  1/2  was  introduced  to  avoid 
decreasing  the  grid  size  in  cases  when  initial  p  is  only  slightly  larger  than  po; 
note  that  decreasing  the  grid  size  is  more  costly  than  increasing  it  —  because 
the  latter  operation  can  be  performed  on  the  existing  grid  from  the  previous 
step,  while  in  the  former  case  the  density  grid  must  be  build  from  data.  From 
the  definition  of  p  one  can  see  that  its  value  is  never  smaller  than  1.  If  the  grid 
size  is  too  small,  the  value  of  p  is  about  1.  When  p  starts  to  grow  and  becomes 
significantly  greater  than  1,  say  its  value  is  about  2,  it  means  that  there  is  on 
average  about  two  points  per  each  non-empty  cells.  At  that  moment  for  most 


637 


Fig.  2.  Comparison  of  density  for  cells  which  contain  data  (p)  and  true  density;  (a) 
sample  points  lying  along  a  straight  line  y  =  ax  +  6;  (b)  grid  with  Ax  =  ^/2,  p  =  1, 
density  =  1/16;  (c)  grid  with  Ax  =  A,  still  p  =  1  but  density  =  1/4;  (c)  grid  with 
Ax  =  2 A,  p  =  2  and  density  =  1.  Note  that  parameter  p  stays  the  same  until  Ax 
becomes  larger  than  A  or  Ay  larger  than  bA;  A  is  the  smallest  difference  between  x 
coordinates  of  points. 


data  one  can  start  analyzing  functionality  based  on  that  grid  size,  therefore  the 
best  default  value  for  po  is  around  2.  This  default  value  works  very  well  for  more 
or  less  evenly  distributed  data  points.  However,  when  the  distribution  of  data  is 
very  uneven,  po  should  be  increased. 


2.4  Functionality  Test 

The  input  to  the  functionality  test  is  a  frequency  table  of  data  for  a  grid  based 
on  6(X)  and  b(Y),  the  binned  values  of  x  and  y.  The  grid  1>(X)  x  5(T)  can  be 
(1)  built  based  on  the  known  error,  (2)  determined  by  the  above  algorithm,  or 
(3)  obtained  from  the  already  binned  data.  The  following  algorithm  determines 
whether  it  is  worthwhile  to  search  for  an  equation  that  fits  the  data: 


638 


Algorithm;  Test  functional  relationship  between  x  and  y 

given  the  table  of  actual  record  counts 

AV  «—  average  number  of  records  per  cell 
for  each  value  in  b(X) 

find  all  groups  of  adjacent  cells  with  counts  >  AV 
if  #  of  groups  >  o  then 
return  NO-FUNCTION 
end  if 
end  for 

if  average  #  of  groups  >  ^  then 
return  NO-FUNCTION 
else 

return  FUNCTION 
end  if 

end  algorithm 

The  algorithm  is  controlled  by  two  modifiable  parameters,  a  and  0,  which 
are  measures  of  local  (or)  and  global  (0)  uniqueness  in  y,  that  correspond  to  the 
number  of  values  of  y  for  the  same  value  of  x.  The  default  values  used  by  49er 
are  a  as  3,  /?  as  1.5.  For  a  =  3  the  functionality  test  fails  when  for  a  value 

in  b(X)  there  is  more  than  3  adjacent  groups  of  cells  with  the  above  average 
density  of  points.  This  higher  value  of  a  solves  the  problem  of  rare  outliers: 
there  could  be  up  to  2  outliers  (that  is,  cells  having  an  above  average  number 
of  points)  provided  it  happens  very  rarely.  However,  many  outliers  or  frequent 
discontinuities  in  y  should  fail  the  test,  therefore  the  value  of  0  should  be  much 
smaller  and  close  to  one. 

The  values  of  o  and  0  should  be  increeised  for  sparse  data  with  strong  back¬ 
ground  noise,  while  they  should  be  reduced  if  we  are  looking  for  high-precision 
functional  dependencies  in  experimental  data.  If  the  background  noise  is  strong, 
fluctuations  in  noise  could  results  in  cells  having  above  the  average  number  of 
points  coming  from  pure  noise.  In  that  case,  considering  cells  with  the  above 
average  density  of  points  may  not  always  be  a  cure  for  the  noise.  In  such  cases 
one  should  increase  the  values  of  a  and  0  to  allow  more  discontinuities  in  y, 
coming  from  noise.  On  the  other  extreme,  if  the  data  result  from  high-precision 
experiments  with  very  little  noise,  discontinuities  in  y  usually  would  reflect  the 
true  nature  of  the  dependency  in  data  (if  any)  and  therefore  should  be  treated 
more  seriously:  a  and  0  should  be  decreased. 

3  An  Application:  Discovery  of  Functionality  in  49er 

49er  is  a  machine  discovery  system  that  analyzes  large  databases  in  search  for  reg¬ 
ularities  hidden  in  those  data.  In  the  first  phase  of  its  exploration  of  a  database, 
49er  looks  for  regularities  holding  for  2  attributes  (variables).  We  will  focus  on 
that  problem. 

The  two  basic  types  of  regularities  considered  by  49er  are  contingency  tables 
and  equations.  While  analysis  of  contingency  tables  is  relatively  fast  (for  exam- 


639 


pie,  49er  needs  about  0.1s  for  a  typical  contingency-ail  regularity),  the  search 
for  equations  is  quite  costly  (3-5  seconds  for  the  same  data).  Therefore  49er  uses 
domain  knowledge  about  attributes  (for  example,  to  avoid  search  for  equations 
for  nominal  attributes)  and  the  test  for  functionality  to  avoid  the  search  for 
equations  when  there  is  no  hope  to  find  them. 

Table  1  presents  empirical  data  which  show  advantages  of  the  functionality 
test  on  several  test  runs.  While  the  number  of  detected  regularities  was  only 
slightly  smaller  when  the  test  has  been  applied,  the  computation  time  was  sig¬ 
nificantly  shorter.  In  tests  1-4  some  26,000  of  datasets  were  examined  by  49er. 
Only  few  of  them  can  be  reasonably  approximated  by  functional  dependencies. 
Most  discovered  equations  are  very  weak.  Equations  “lost”  when  the  test  was 
used  belong  to  the  weakest  of  all  discovered  equations  and  their  number  varies 
when  one  slightly  changes  a  and  /?.  Test  5  and  6  demonstrate  the  very  small 
overhead  introduced  by  the  application  of  the  test:  a  quite  strong  functional 
dependence  was  in  all  datasets.  Note  that  this  time  no  equation  was  lost.  Many 
tests  conducted  on  real  and  artificially  generated  data  show  that  only  a  small 
percent  of  very  weak  equations  could  be  sometimes  lost  when  the  test  is  used. 

Table  1.  Comparison  of  49et  execution  times  with  and  without  test  for  functionality. 
Tests  1-4  were  conducted  on  about  1400  records,  10  attributes,  and  an  average  10 
values  per  attribute.  49er  considered  about  26,000  datasets  in  those  data.  The  results 
show  that  run-time  was  reduced  significantly  while  only  about  10%  of  equations  were 
not  discovered.  Tests  5  and  6  were  applied  to  about  6000  records,  6  attributes  and  from 
3  to  50  values  per  attribute.  In  tests  5  and  6  49er  focused  only  on  pairs  of  attributes  that 
had  some  functional  dependencies,  known  or  discovered  earlier,  so  that  the  test  almost 
always  resulted  in  the  “yes”  answer.  Note  that  execution  times  for  tests  5  and  6  are 
almost  the  same  —  it  means  that  the  overhaed  introduced  by  the  test  for  functionality 
can  be  neglected  compared  to  time  needed  by  Equation  Finder. 


Test 

number 

Functionality 
test  used? 

Number  of 
equations 

CPU  time 

Comments 

1 

yes 

71 

258  minut 

database  of 

2 

no 

77 

610  minut 

1400  records 

3 

yes 

24 

58  minut 

same  data  but  a  more 

4 

no 

26 

80  minut 

shallow  search 

5 

yes 

23 

699  second 

6000  records,  only  datasets 

6 

no 

23 

695  second 

with  functional  dependencies 

The  functionality  test  can  be  also  applied  as  a  tool  for  discovering  functional 
dependencies  between  attributes.  When  requested,  49er  can  use  only  this  test 
while  analyzing  a  database.  The  results  are  of  interest  to  users  looking  only 
for  the  existence  of  functional  dependencies  in  data,  but  can  also  be  used  to 
modify  the  default  search  strategy  of  49er.  The  latter  application  is  especially 
useful  for  databases  with  many  attributes:  the  functionality  test  can  be  quickly 
applied  to  many  pairs  of  attributes.  The  results  of  this  preliminary  search  can  be 
used  to  direct  the  search  in  the  most  promising  directions,  without  considering 
combinations  of  attributes  for  which  the  functionality  test  failed  or  returned  high 
values  of  0. 


640 


4  Discovery  of  Muitifunctions 

We  will  now  discuss  an  extension  of  the  functionality  test  that  leads  to  discovery 
of  multifunctions.  Given  data  pass  our  functionality  test  when  for  most  values 
of  X  there  is  only  one  continuous  group  of  cells  with  counts  above  the  average. 
However,  if  the  test  consistently  shows  two  groups  of  cells  for  a  subrange  of  i,  it 
could  mean  that  there  are  two  functional  dependencies  in  data:  yi  =  yi(i)  and 
ya  =  yaC®)- 

The  existence  of  two  or  more  functional  dependencies  in  data  is  practically 
useful.  For  example,  if  one  collects  data  being  weight  of  African  elephants  of 
different  age,  one  can  observe  that  there  seems  to  be  two  distinct  dependencies 
weight-age.  A  closer  look  could  show  that  there  are  actually  two  kinds  of  African 
elephants:  large  bush  elephants  {Loxodonta  africana)  and  smaller  forest  elephants 
{Loxodonta  cyclotis).  Such  a  discovery  of  two  or  more  functional  dependencies 
in  data  could  lead  to  the  discovery  of  a  classification  cite  Piatetsky-Shapiro. 

We  have  implemented  Multiple  Equation  Finder  (called  MEF)  that  uses  an 
extended  version  of  the  functionality  test  to  discover  many  functional  dependen¬ 
cies.  If  the  test  returns  high  value  0,  MEF  checks  continuity  of  cell  groups  in 
X  direction,  partitions  data,  runs  Equation  Finder  for  every  group  of  adjacent 
cells,  and  finally  tries  to  extend  range  of  discovered  equations  (if  any)  and  reduce 
number  of  groups  by  merging  them  if  they  can  be  described  simultaneously  by 
means  of  the  same  equation.  Detailed  description  of  group  merging  goes  beyond 
the  scope  of  this  paper  and  will  be  presented  in  another  article. 

Algorithm:  Multiple  Equation  Finder 
run  Test  for  Functionality 
if  average  #  of  groups  <  0  then 
run  Equation  Finder 
else 

for  each  group  of  adjacent  high-density  cells 

merge  the  group  with  others  if  they  are  adjacent  in  r 
end  for 

for  every  group  spanning  through  many  subranges  of  x 
run  Equation  Finder 
end  for 
repeat 

merge  two  groups  when  they  have  similar  equations  or 

equation  of  one  groups  describes  data  in  the  other  group 
until  no  merging  happened  in  the  last  iteration 
end  if 

end  algorithm 

5  Conclusions 

In  this  paper  we  have  presented  a  test  which  detects  existence  of  functional  de¬ 
pendencies  in  data.  Currently,  the  functionality  test  is  used  (1)  in  the  database 


641 


mining  system  49er  as  an  application  condition  of  the  search  for  equations  and 
(2)  to  find  multiple  equations  in  data  coming  from  databases.  In  49er,  the 
test  for  functionality  is  always  applied  to  data  before  Equation  Finder  search 
starts  searching  for  equations.  In  Multiple  Equation  Finder,  the  test  determines 
whether  and  how  the  data  should  be  partitioned  so  that  a  single  equation  can 
become  a  search  target  in  each  partition. 

References 

1.  B.C.  Falkenhainer  &  R.S.  Michalski:  Integrating  quantitative  and  qualitative 
discovery;  the  ABACUS  system.  Machine  Learning  1,  pp367-422  (1986) 

2.  P.  Hoschka  &  W.  Kldsgen:  A  Support  System  for  Interpreting  Statistical 
Data,  in:  Piatetsky-Shapiro  G.  &  Frawley  W.  eds  Knowledge  Discovery  in 
Databases,  Menlo  Park,  Calif.:  AAAI  Press  (1991) 

3.  W.  Kldsgen:  Patterns  for  Knowledge  Discovery  in  Databases  in:  Zytkow  J. 
ed  Proceedings  of  the  ML-92  Workshop  on  Machine  Discovery  (MD~92), 
National  Institute  for  Aviation  Research,  Wichita,  Kansas,  pp.1-10  (1992) 

4.  P.  Langley,  H.A.  Simon, G.L.  Bradshaw,  k  J.M.  Zytkow:  Scientific  discovery: 
Computational  explorations  of  the  creative  processes.  Cambridge,  MA;  MIT 
Press  (1987) 

5.  M.  Moulet:  ARC. 2:  Linear  Regression  In  ABACUS,  in:  Zytkow  J.  ed  Pro¬ 
ceedings  of  the  ML-92  Workshop  on  Machine  Discovery  (MD-92),  National 
Institute  for  Aviation  Research,  Wichita,  Kansas,  pp. 137-146  (1992) 

6.  B.  Nordhausen  &  P.  Langley;  An  Integrated  Approach  to  Empirical  Discov¬ 
ery.  in:  J.Shrager  k  P.  Langley  (eds.)  Computational  Models  of  Scientific 
Discovery  and  Theory  Formation,  pp.  97-128,  Morgan  Kaufmann  Publish¬ 
ers,  San  Mateo,  CA  (1990) 

7.  G.  Piatetsky-Shapiro  k  C.  Matheus:  Knowledge  Discovery  Workbench,  in;  G. 
Piatetsky-Shapiro  ed.  Proc.  of  AAAI-91  Workshop  on  Knowledge  Discovery 
in  Databases,  pp.  11-24  (1991) 

8.  R.  Zembowicz  k  J.M.  Zytkow;  Automated  Discovery  of  Empirical  Equations 
from  Data,  Proceedings  of  the  ISMIS-91  Symposium,  Springer- Verlag  (1991) 

9.  R.  Zembowicz  &  J.M.  Zytkow:  Discovery  of  Regularities  in  Databases,  in 
Zytkow  J.  ed  Proc.  ML-92  Workshop  on  Machine  Discovery.  Aberdeen,  U.K. 
pp.  18-27  (1992) 

10.  J.  Zytkow  k  J.  Baker:  Interactive  Mining  of  Regularities  in  Databases.  In 
Knowledge  Discovery  in  Databases,  eds.  G.  Piatetsky-Shapiro  and  W.  Fraw¬ 
ley.  Menlo  Park,  Calif.:  AAAI  Press  (1991) 

11.  J.M.  Zytkow;  Combining  many  searches  in  the  FAHRENHEIT  discovery  sys¬ 
tem,  Proceedings  of  the  Fourth  International  Workshop  on  Machine  Leam- 
iuy.  Iivine,  CA;  Morgan  Kaufmann,  281-287  (1987). 


ROUGH  SET  LEARNING 
OF  PREFERENTIAL  ATTITUDE 
IN  MULTI-CRITERIA  DECISION  MAKING 


Roman  Slowinski 

Institute  of  Computing  Science,  Technical  University  of  Poznan 
60-965  Poznan,  Poland 


Abstract.  Rough  set  theory  is  a  useful  tool  for  analysis  of  decision  situ¬ 
ations,  in  particular  multi-criteria  sorting  problems.  It  deals  with  vague¬ 
ness  in  the  representation  of  decision  maker’s  (DM’s)  preferences,  caused 
by  granularity  of  the  representation.  Using  the  rough  set  approach,  it 
is  possible  to  learn  a  set  of  sorting  rules  from  examples  of  sorting  deci¬ 
sions  made  by  the  DM.  The  rules  involve  a  minimum  number  of  most 
important  criteria  and  they  do  not  correct  vagueness  manifested  in  the 
preferential  attitude  of  the  DM;  instead,  produced  rules  are  categorized 
into  deterministic  and  non-deterministic.  The  set  of  sorting  rules  ex¬ 
plains  a  decision  policy  of  the  DM  and  may  be  used  to  support  next 
sorting  decisions.  The  decision  support  is  made  by  matching  a  new  case 
to  one  of  sorting  rules;  if  it  fails,  a  set  of  the  ’nearest’  sorting  rules  is 
presented  to  the  DM.  In  order  to  find  the  ’nearest’  rules  a  new  distance 
measure  based  on  a  valued  closeness  relation  is  proposed. 

1  Introduction 

Decision  making  is  one  of  the  most  natural  acts  of  human  beings.  Decision  anal¬ 
ysis  has  attracted  scientists  for  a  long  time  who  offered  various  mathematical 
tools  to  deal  with.  Scientific  decision  analysis  intends  to  bring  into  light  those 
elements  of  a  decision  situation  which  are  not  evident  for  the  actors  and  may 
influence  their  attitude  towards  the  situation.  More  precisely,  the  elements  re¬ 
vealed  by  the  scientific  decision  analysis  either  explain  the  situation  or  prescribe, 
or  simply  privilege,  some  behavior  in  order  to  increase  the  coherence  between 
evolution  of  the  decision  process  on  the  one  hand  and  goals  and  value  systems 
of  the  actors,  on  the  other  hand  (cf.  Roy  (1985)). 

When  making  real  decisions,  the  decision  maker  (DM)  usually  takes  into 
account  multiple  points  of  view  (criteria)  for  evaluation  of  decision  alternatives. 
However,  the  multi-criteria  decision  problem  has  no  solution  without  additioned 
information  about  DM’s  preferences.  Having  the  preferential  information,  one 
can  build  on  the  multiple  criteria  a  global  preference  model  which  yields  ’the 
best’  solution  of  a  given  decision  problem. 

There  are  two  major  ways  of  constructing  a  global  preference  model  upon 


643 


preferential  information  obtained  from  a  DM.  The  first  one  comes  from  math- 
emati''al  decision  analysis  and  consists  in  building  a  functional  or  a  relational 
model  (cf.  Roubens  and  Vincke  (1985)).  The  second  one  comes  from  artificial 
intelligence  and  builds  up  the  model  via  learning  from  examples  (cf.  Michalski 
(1983)).  In  the  context  of  artificial  intelligence,  the  preferential  information  is 
called  knowledge  about  preferences  and  the  DM  is  often  called  an  expert. 

In  this  paper,  we  are  interested  in  the  second  way  of  constructing  a  global 
preference  model  for  a  multi- crtlerta  sorting  problem.  The  sorting  problem  con¬ 
sists  in  assignment  of  each  object  from  a  set  to  an  appropriate  pre-deflned  cat¬ 
egory  (for  instance;  acceptance,  rejection  or  request  for  an  additional  infor¬ 
mation).  In  this  case,  the  global  preference  model  consists  of  a  set  of  logical 
statements  (sorting  rules)  ”if  . . .  then  ...”  describing  a  preferential  attitude  of 
the  DM.  The  set  of  rules  is  derived  from  examples  of  sorting  decisions  taken 
by  the  DM  (expert)  on  a  subset  of  objects.  The  multi-criteria  sorting  of  a  new 
object  is  then  supported  by  matching  its  description  to  one  of  the  sorting  rules; 
if  it  fails,  a  set  of  the  ’nearest’  sorting  rules  (in  the  sense  of  a  given  distance 
measure)  is  presented  to  the  DM. 

The  preferential  information  is  usually  vague  (inconsistent)  because  of  dif¬ 
ferent  sources  of  uncertainty  and  imprecision  (cf.  Roy  (1989)).  Vagueness  may 
be  caused  by  granularity  of  a  representation  of  preferential  information.  Due 
to  the  granularity,  the  rules  describing  a  preferential  attitude  of  the  DM  can  be 
categorized  into  deterministic  and  non-determinisiic.  Rules  are  deterministic  if 
they  can  be  described  univocally  by  means  of  ’granules’  of  the  representation, 
and  they  are  non-deterministic,  otherwise. 

A  formal  framework  for  discovering  deterministic  and  non-deterministic  rules 
from  a  given  representation  of  knowledge  has  been  given  by  Pawlak  (1982)  and 
called  rough  set  theory.  The  rough  set  theory  assumes  knowledge  representation 
in  a  decision  table  form  which  is  a  special  case  of  an  information  system.  Rows  of 
this  table  correspond  to  objects  (actions,  alternatives,  candidates,  patients,  etc.) 
and  columns  correspond  to  attributes.  For  each  pair  (object,  attribute)  there  is 
known  a  value  called  descriptor.  Each  row  of  the  table  contains  descriptors  rep¬ 
resenting  information  about  corresponding  object  from  the  universe.  In  general, 
the  set  of  attributes  is  partitioned  into  two  subsets:  condition  attributes*  (cri¬ 
teria,  features,  symptoms,  etc.)  and  decision  attributes  (decisions,  assignments, 
classifications,  etc.). 

The  observation  that  objects  may  be  indiscernible  in  terms  of  descriptors 
is  a  starting  point  of  the  rough  set  philosophy.  Indiscernibility  of  objects  by 
means  of  attributes  prevents  generally  their  precise  assignment  to  a  set.  Given 
an  equivalence  relation  viewed  eis  an  indiscernibility  relation  which  thus  induces 
an  approximation  space  made  of  equivalence  classes,  a  rough  set  is  a  pair  of  a 

*In  decision  problems,  the  concept  of  criterion  is  often  used  instead  of  condition 
attribute.  It  should  be  noticed,  however,  that  the  two  concepts  have  sometimes  different 
meaning  because  the  domain  (scale)  of  a  criterion  has  to  be  ordered  according  to 
decreasing  or  increasing  preference  while  tlie  domain  of  a  condition  attribute  has  not 
to  be  ordered.  Siinilary,  the  domain  of  a  decision  attribute  may  be  ordered  or  not. 


644 


lower  and  an  upper  approximation  of  a  set  in  terms  of  the  classes  of  indiscernible 
objects.  In  other  words,  a  rough  set  is  a  collection  of  objects  which,  in  general, 
cannot  be  precisely  characterized  in  terms  of  the  values  of  the  set  of  attributes, 
while  a  lower  and  an  upper  approximation  of  the  collection  can  be.  Using  a  lower 
and  an  upper  approximation  of  a  set  (or  family  of  sets  -  partition)  one  can  define 
an  accuracy  and  a  quality  of  approximation.  These  are  numbers  from  interval 
[0, 1]  which  define  how  exactly  <.^ne  can  describe  the  examined  set  of  objects  using 
available  information.  The  most  complete  presentation  of  the  rough  set  theory 
can  be  found  in  Pawlak  (1991). 

We  shall  use  the  rough  set  approach  to  derive  sorting  rules  from  examples 
given  by  the  DM.  In  the  next  section,  we  characterize  the  methodology,  including 
the  use  of  sorting  rules  for  decision  support.  In  section  3,  we  apply  the  proposed 
methodology  to  an  example  of  selection  of  candidates  to  a  school.  The  final 
section  groups  conclusions. 

2  Rough  Set  Approach  to  a  Multi-Criteria  Sorting 
Problem 

2.1  Problem  definition  and  expected  results 

Having  a  set  of  objects  described  by  a  number  of  criteria,  the  sorting  problem 
consists  in  cissignment  of  each  object  to  an  appropriate  pre-defined  category  (for 
instance:  acceptance,  rejection  or  request  for  an  additional  information). 

Let  the  preferential  information  coming  from  a  DM,  or  an  expert,  be  given 
in  the  form  of  a  decision  table  where  objects  correspond  to  examples,  condition 
attributes  to  criteria  and  the  only  decision  attribute,  to  decisions  about  assign¬ 
ment  to  a  category.  In  other  words,  the  rows  of  the  table  are  examples  of  sorting 
decisions  related  to  the  corresponding  objects. 

We  are  expecting  the  following  results  from  the  rough  set  analysis  of  the 
above  preferential  information: 

■  evaluation  of  importance  of  particular  criteria, 

•  construction  of  minimal  subsets  of  independent  criteria  ensuring  the  same 
quality  of  sorting  cis  the  whole  set,  i.e.  reducts  of  the  set  of  criteria, 

•  non-empty  intersection  of  those  reducts  gives  a  core  of  criteria  which  cannot 
be  eliminated  without  disturbing  the  ability  of  approximating  the  sorting 
decisions, 

•  elimination  of  redundant  criteria  from  the  decision  table, 

•  generation  of  the  sorting  rules  from  the  reduced  decision  table;  they  involve 
the  relevant  criteria  only  and  explain  a  decision  policy  of  the  DM  (expert). 

Of  course,  the  most  important  result  from  the  viewpoint  of  decision  support 
is  the  set  of  sorting  rules.  It  constitutes  a  global  model  of  DM’s  (expert’s) 
preferences  based  on  the  set  of  examples.  Using  the  terms  from  AI,  the  set  of 


645 


examples  is  a  training  sample  for  learning  of  the  expert's  preferential  attitude 
(cf.  Grzymala-Busse  (1992)). 

Rough  set  approach  has  been  applied  with  success  to  sorting  problems  from 
medicine  (Fibak  et  al.  (1986),  Slowinski  et  al.  (1988)),  pharmacy  (Krysinski 
(1990)),  technical  diagnostics  (Nowicki  et  al.  (1992))  and  many  others  (the 
most  complete  reference  can  be  found  in  Slowinski,  ed.  (1992)). 

2.2  Decision  support  using  sorting  rules 

The  sorting  of  a  new  object  can  be  supported  by  matching  its  description  to  one 
of  the  sorting  rules.  The  matching  may  lead  to  one  of  four  situations; 

(i)  the  new  object  matches  exactly  one  of  deterministic  sorting  rules, 

(ii)  the  new  object  matches  exactly  one  of  non-deterministic  sorting  rules, 

(iii)  the  new  object  doesn’t  match  any  of  the  sorting  rules, 

(iv)  the  new  object  matches  more  than  one  rule. 

In  (i),  the  sorting  suggestion  is  direct.  In  (ii),  however,  the  suggestion  is 
no  more  direct  since  the  matched  rule  is  ambiguous.  In  this  case,  the  DM  is 
informed  about  the  number  of  sorting  examples  which  support  each  possible 
category.  The  number  is  called  a  strength.  If  the  strength  of  one  category  is 
greater  than  the  strength  of  other  categories  occurring  in  the  non-deterministic 
rule,  one  can  conclude  that  according  to  this  rule,  the  considered  object  most 
likely  belongs  to  the  strongest  category. 

Situation  (iii)  is  more  burdensome.  In  this  case,  one  can  help  the  DM  by 
presenting  him  a  set  of  the  rules  ’nearest’  to  the  description  of  the  new  object. 
The  notion  of  ’nearest’  involves  the  use  of  a  distance  meeisure.  In  this  paper, 
we  present  one  known  distance  measure  and  define  a  new  one  having  some  good 
properties. 

As  a  sorting  rule  may  have  less  conditions  (criteria)  than  the  description  of 
an  object  to  be  sorted,  the  distance  between  the  sorting  rule  and  the  object  will 
be  computed  for  criteria  represented  in  the  rule  only.  So,  it  is  assumed  that 
there  is  no  difference  for  other  criteria. 

Let  a  given  new  object  x  be  described  by  values  Ci(i),  02(1), . . .  ,Cm(x)  {m  < 
card{C))  occurring  in  the  condition  part  of  the  sorting  rule  y  described  by  values 
ci(j/), C2(y), .  ■ .  ,Cm(y)  of  the  same  m  criteria. 

According  to  the  first  definition  (cf.  Stefanowski  (1992),  Slowinski  and  Ste- 
fanowski  (1992)),  the  distance  of  object  x  from  rule  y  is  equal  to 

D  =  [^([|  ci(x)  -  c,(y)  j  /(v,....  -  v,™..)]'’} 

1=1 

where;  p  =  1, 2, ...  -  natural  number  to  be  chosen  by  an  analyst, 

u/m.r  -  maximal  and  minimal  value  of  c;  ,  respectively, 

ii  -  importance  coefficient  of  criterion  c/  . 


646 


Let  us  notice  that  depending  on  the  value  of  p,  the  distance  measure  is  more 
or  less  compensatory.  Moreover,  the  greater  is  the  value  of  p  the  greater  is  the 
importance  of  the  largest  partial  dilference  in  D.  The  importance  coefficients 
can  be  determined  either  subjectively  or  taking  into  account  a  sorting  ability 
of  criteria,  e.g.  the  difference  between  the  quality  of  sorting  for  the  set  of  all 
considered  criteria  and  the  quality  of  sorting  for  the  same  set  of  criteria  not 
including  the  checked  one. 

Some  restrictions  on  the  distance  between  a  given  object  and  the  rule  can  also 
be  introduced,  e.g.  a  maximal  number  of  differences  on  corresponding  positions 
of  criteria. 

The  above  definition  has,  however,  some  weak  points.  In  particular,  for  low 
values  of  p,  it  allows  a  major  difference  on  one  criterion  to  be  compensated  by 
a  number  of  minor  differences  on  other  criteria.  For  high  values  of  p,  it  has  a 
tendency  to  overvalue  the  greatest  partial  difference,  up  to  the  point  of  ignoring 
all  other  partial  differences,  for  p  =  cc. 

This  criticism  leads  us  to  definition  of  a  new  distance  measure  based  on, 
so  called,  valued  closeness  relation  R.  is  inspired  by  the  outianking  relat’on 
introduced  by  Roy  (1985). 

The  new  object  x  will  be  compared  to  each  sorting  rule  y  in  order  to  assess 
the  credibility  of  the  affirmation:  ”x  is  close  to  y”,  what  is  denoted  by  xRy.  The 
calculation  of  the  credibility  r(x,  y)  of  this  affirmation  is  based  on  common-sense: 
the  formula  determining  the  value  of  r(a:,  y)  over  the  interval  [0, 1]  is  constructed 
so  as  to  respect  certain  qualitative  principles,  and,  in  particular,  excludes  the 
possibility  of  undesired  compensation.  Credibility  r(T,y)  =  1  if  the  assertion 
xRy  is  well-founded;  r(x,  y)  =  0  if  there  is  no  argument  for  closeness  of  x  to  y. 
The  formula  for  calculation  of  7'(r,  y)  is  essentially  based  on  two  concepts  called 
concordance  and  discordance.  The  goals  of  these  concepts  are  to: 

•  characterize  a  group  of  criteria  (conditions)  considered  to  be  in  concor¬ 
dance  with  the  affirmation  being  studied  and  assess  the  relative  importance 
of  this  group  of  criteria  compared  with  the  remainder  of  the  m  criteria, 

•  characterize  among  the  criteria  which  are  not  in  concordance  with  the 
affirmation  being  studied,  the  ones  whose  opposition  is  strong  enough  to 
reduce  the  credibility  which  would  result  from  taking  into  account  just 
the  concordance,  and  to  calculate  the  possible  reduction  in  it  that  would 
thereby  result. 

To  be  able  to  carry  out  such  calculations,  one  must  first  express  explicitly 
and  numerically: 

•  the  relative  importance  k/  that  the  DM  wishes  to  confer  on  criterion  ci  in 
the  calculation  of  concordance, 

•  the  minimum  value  of  discordance  which  gives  criterion  ci  the  power  to  take 
all  credibility  away  from  the  affirmation  of  the  closeness,  even  if  opposed 
to  all  the  other  criteria  in  concordance  with  the  affirmation;  it  is  denoted 
by  t;i[ci(x)]  and  called  the  veto  threshold  of  criterion  c/. 


The  global  concordance  index  is  defined  as 


m  m 

C(-r,  J/)  =  k,c,{x,  y)/  ^  ki 

(=1  1=1 

where  ci(x,y)  is  a  partial  concordance  index  for  criterion  c/  .  Calculation  of 
involves  two  thresholds;  0  <  9/[c/(x)]  <  p;[ci{x)],  called  indifference  and 
strict  difference  thresholds,  respectively.  The  definition  of  ci{x,y)  and  discor¬ 
dance  index  Di{x,y)  for  criterion  cj  is  given  graphically  in  Fig.l. 


Fig.l.  Concordance  and  discordance  indices  for  object  i  and  rule  y,  with  respect  to 

criterion  c/ 

The  degree  of  credibility  r(x,y)  of  the  closeness  relation  xRy  is  obtained 
from  the  global  concordance  index  weakened  by  discordance  indices  (up  to  the 
point  of  its  annulment): 


r{x,y)  =  C{x,y)  n  \  L={1:  D,ix,y)  >  C{x,y)} 

The  rules  y  with  the  greatest  values  of  r(x,y)  ere  presented  to  the  DM 
together  with  an  information  about  the  strength  of  the  corresponding  categories. 

Situation  (iv)  may  also  be  ambiguous  if  the  matched  rules  (deterministic  or 
not)  lead  to  different  categories.  Then,  the  suggestion  can  be  based  either  on  the 
strength  of  possible  categories,  or  on  an  analysis  of  the  sorting  examples  whicii 
support  each  possible  category.  In  the  latter  case,  the  suggested  category  is  that 
one  which  is  supported  by  a  sorting  example  being  the  closest  to  the  new  object, 
in  the  sense  of  relation  R. 

3  Selection  of  Candidates  to  a  School 

To  illustrate  the  rough  set  learning  of  a  preferential  attitude,  let  us  consider  a 
simple  case  of  selection  of  candidates  to  a  school  (cf.  Moscarola  (1978)). 


648 


The  candidates  to  tiie  school  have  submitted  their  application  packages  with 
secondary  school  certificate,  curriculum  vitae  and  opinion  from  previous  school, 
for  consideration  by  an  admission  committee.  Basing  on  these  documents,  the 
candidates  were  described  iw’ng  seven  criteria  (condition  attributes).  The  list  of 
these  criteria  together  with  corresponding  scales,  ordered  from  the  best  to  the 
worst  value,  is  given  below: 

Cl  -  score  in  mathematics,  {'^,-'^3} 

C2  -  score  in  physics,  {5,4,3} 

C3  -  score  in  English,  {5,4,3} 

C4  -  mean  score  in  other  subjects,  {5,4,3} 

C5  -  type  of  secondary  school,  {1,2,3} 

C6  -  motivation,  {1,2,3} 

C7  -  opinion  from  previous  school,  {1,2,3} 

Fifteen  cand'dates  having  rather  different  application  packages  have  been 
sorted  by  the  committee  after  due  consideration.  This  set  of  examples  will  be 
used  to  learn  a  preferential  altitude  of  the  committee. 

The  decision  attribute  d  makes  a  dichotomic  partition  y  of  the  candidates: 
d  =  A  means  admission  and  d  ~  R  means  rejection.  The  decision  table  with 
fifteen  candidates  is  shown  in  Table  1.  It  is  clear  that  C  =  {ci,  C2,  C3,  c^,  C5,  cg,  C7} 
and  D  =  (d). 

Let  Yjh  be  the  set  of  candidates  admitted  and  Yr  the  set  of  candidates 
rejected  by  the  committee,  Y^  =  {^ii  X4,  X5,Z7,  zg,  im,  lu,  X12,  iis},  Yr  = 
{z2,  Z3,  xg, Z9,  zi3,  Z14},  T  =  {^4,?/}}.  Sets  Ya  and  Yr  are  D-definable  sets 
in  the  decision  table.  'I'here  are  13  C-elementary  sets:  couples  of  indiscernible 
candidates  {x4,Xio},  {xg.xg}  and  11  di;  -rnible  candidates.  The  C’-lower  and 
the  C-upper  approximations  of  sets  Ya  and  Yr  are  equal,  respectively,  to: 

CXa  =  {xi,  x.i,  xg,  ir  xio,  X|  I.  X12,  Xis} 

CY A  —  (xi ,  X4,  Xg,  X7,  Xg,  X9,  Xio,  Xi  1 ,  Xi2,  Xjs} 

Bnc{YA)  -  {xg.xg} 

C.yR  =  {X2,  X3,  Xg,  X13,  X14} 

CYr  -  {X2,X3,XC,X8,X9,X13,X14} 

Btic{Yr)  =  {X8,X9} 

Tlie  accuracy  of  approximation  of  sets  Ya  and  Yr  by  C  is  equal  *0  0.8  and 
0.71,  respectively,  and  the  quality  of  approximation  of  the  decision  by  C  is  equal 
to  0.87. 

Let  us  observe  that  the  C-doublful  region  of  the  decision  is  composed  of  two 
candidates:  xg  and  X9  .  Indeed,  they  haw  the  same  value  according  to  criteria 
from  C  but  the  committee  has  admitted  Xg  and  rejected  X9  .  It  means  that  the 
decision  is  inconsi-stenl  with  evaluation  of  the  candidates  by  criteria  from  C.  So, 
apparently,  the  committee  took  into  account  an  additional  inforii'alion  from  the 
application  packages  of  the  candidates  or  from  an  interview  with  them.  Thi.s 


649 


Table  1.  Decision  table  composed  of  sorting  examples 


Criterion 

Candidate 

Cl 

C2 

Cj 

C4 

C5 

C6 

C7 

XI 

4 

4 

4 

4 

2 

2 

1 

A 

I2 

3 

3 

4 

3 

2 

1 

1 

R 

I3 

3 

4 

3 

3 

1 

2 

2 

R 

Xi 

5 

3 

5 

4 

2 

1 

2 

A 

Xb 

4 

4 

5 

4 

2 

2 

1 

A 

X6 

3 

4 

3 

3 

2 

1 

3 

R 

X7 

4 

4 

5 

4 

2 

2 

2 

A 

xa 

4 

4 

4 

4 

2 

2 

2 

A 

X9 

4 

4 

4 

4 

2 

2 

2 

R 

XlO 

5 

3 

5 

4 

2 

1 

2 

A 

xn 

5 

4 

4 

4 

1 

1 

2 

A 

X12 

5 

3 

4 

4 

2 

2 

2 

A 

Xl3 

4 

3 

3 

3 

3 

2 

2 

R 

Xli 

3 

3 

4 

3 

2 

3 

3 

R 

XlS 

4 

5 

5 

4 

2 

1 

] 

A 

conclusion  suggests  to  the  committee,  either  adoption  of  an  additional  discrimi¬ 
natory  criterion  or,  if  its  explicit  definition  would  be  too  difficult,  creation  of  a 
third  category  of  candidates  :  those  who  should  be  invited  to  an  interview. 

The  next  step  of  the  rough  set  analysis  of  the  decision  table  is  construction  of 
minimal  subsets  of  independent  criteria  ensuring  the  same  quality  of  sorting  as 
the  whole  set  C,  i.e.  the  reducts  of  C.  In  our  case,  there  are  three  such  reducts; 

REDy{C)  =  {C2,C3,C6,C7} 

REDliC)  =  {ci,c3,cr) 

REDl:{C)  =  {C2,C3,C5,C7} 

It  can  be  said  that  the  committee  took  the  fifteen  sorting  decisions  taking 
into  account  the  criteria  from  one  of  the  reducts  and  discarded  all  the  remaining 
criteria.  Let  us  notice  that  criterion  C4  hris  no  influence  at  ail  on  the  decision 
because  it  is  not  represented  in  any  reduct. 

It  is  interesting  to  see  tiie  intersection  of  all  reducts,  i.e.  the  core  of  criteria; 

COREyiO  =  RED\,{C)  n  REDl(C)  n  REDl{C)  =  {C3.  C7} 

The  core  is  the  most  essential  part  of  set  C,  i.e.  it  cannot  be  eliminated 
without  disturbing  the  ability  of  approximating  the  decision. 

In  a  real  case,  all  the  reducts  and  the  core  should  be  submitted  for  consider¬ 
ation  by  the  committee  in  view  of  getting  its  opinion  about  what  reduct  should 
be  used  to  generate  sorting  rules  from  the  reduced  decision  table. 

Let  us  supiiose  that  the  committee  has  chosen  reduct  REDy{C}  composed 
of  ci,C3,C7  ,  i.e,  scores  in  mathematics  and  English,  and  opinion  from  previous 
school.  This  choice  could  be  explained  in  such  a  way  that  the  score  in  mathe¬ 
matics  (ci)  seems  *0  the  committee  more  important  than  the  score  in  physics 
(C2)  plus  type  of  secondary  school  (C5)  or  motivation  (ce). 


650 


Now,  the  decision  table  can  be  reduced  to  criteria  represented  in  RED^(C). 
The  sorting  rules  generated  from  the  reduced  decision  table  have  the  following 
form: 


rule  #1 

if  Cl  =  5 

then  d=A 

rule 

if 

C3  =  5 

then  d=A 

rule  #3 

if  Cl  =4 

and 

C7  =  1 

then  d=A 

rule  #4 

if  Cl  =4 

and  C3  =  4 

and 

C7  =  2 

then  d=A  or  R 

rule  #5 

if  Cl  =  3 

then  d=R 

rule  #6 

if 

C3  =  3 

then  d=R 

Five  rules  are  deterministic  and  one  is 

non-deterministic. 

The  non-determini 

Stic  rule  #4  follows  from  indiscernibility  of  candidates  ig  and  xg  which  belong 
to  different  categories  of  decision.  It  defines  a  profile  of  candidates  which  should 
create  the  third  category  of  decision,  e.g.  those  candidates  who  should  be  invited 
to  an  interview. 

The  rules  represent  clearly  the  following  policy  of  the  selection  committee: 

Admit  all  candidates  having  score  5  in  mathematics  or  in  English. 

Admit  also  those  who  have  score  ^  in  mathematics  and  in  English 
but  very  good  opinion  from  a  previous  school.  In  the  case  of  score 
4  in  mathematics  and  in  English  but  only  a  moderate  opinion  from 
a  previous  school,  invite  the  candidate  to  an  interview.  Candidates 
having  score  3  in  mathematics  or  in  English  are  to  be  rejected. 

The  sample  of  fifteen  sorting  decisions  have  been  considered  as  a  training 
sample  and  used  to  reveal  the  preferential  attitude  of  the  committee.  The  globad 
preference  model  represented  by  the  set  of  6  sorting  rules  could  be  applied  now 
to  support  selection  of  new  candidates,  as  suggested  in  p.2.2.  of  this  paper. 

4  Concluding  Remarks 

The  aim  of  this  paper  was  to  show  that  the  rough  set  theory  is  a  useful  tool  for 
discovery  of  a  preferential  attitude  of  the  DM  in  multi-criteria  decision  making 
problems,  in  particular  multi-criteria  sorting  problems. 

We  claim  that  the  global  preference  model  in  the  form  of  rules  derived  from 
a  set  of  examples  has  an  advantage  over  a  functional  or  a  relational  model 
because  it  explains  the  preferential  attitude  through  important  facts  in  terms  of 
significant  criteria  only.  The  rules  are  well-founded  by  examples  and,  moreover, 
inconsistencies  manifested  in  the  examples  are  neither  corrected  nor  aggregated 
by  a  global  function  or  relation.  The  rough  set  approach  does  not  need  any 
additional  information  like  probability  in  statistics  or  grade  of  membership  in 
fuzzy  set  theory.  It  is  conceptually  simple  and  needs  simple  algoritluM-' 

The  set  of  rules  can  be  used  to  support  decisions  concerning  new  coming 
objects.  The  decision  support  is  made  by  matching  a  new  object  to  one  of  rules; 
if  it  fails,  a  set  of  the  ’nearest’  rules  is  presented  to  the  DM.  In  this  paper, 
in  order  to  find  the  ’nearest’  rules,  a  new  distance  measure  based  on  a  valued 
closeness  relation  has  been  proposed.  It  involves  concordance  and  discordance 
tests  for  threshold  comparisons  of  the  new  object  with  pacli  partieular  rule 


651 


References 

Fibak,  J.,  Pawlak,  Z.,  Slowinski,  K.,  Sloivinski,  R.  (1986).  Rough  sets  based  deci¬ 
sion  algorithm  for  treatment  of  duodenal  ulcer  by  HSV.  Bulletin  of  the  Polish 
Academy  of  Sciences,  ser.  Biological  Sciences,  34,  227-246. 

Grzymala-Busse,  J.W.  (1992).  LERS  -  a  system  for  learning  from  examples  based  on 
rough  sets.  In:  Slowinski,  ed.  (1992),  pp.3-18. 

Krysinski,  J.  (1990).  Rough  sets  approach  to  the  analysis  of  the  structure-activity  re¬ 
lationship  of  quaternary  imidazolium  compounds.  Arzneimittel-Forschung/Drug 
Research  40  (II),  795-799. 

Michalski,  R.S.  (1983).  A  theory  and  methodology  of  inductive  learning.  In;  Machine 
Learning  (R.S.Michalski,  J.G.Carbonell,  T.M. Mitchell,  eds.),  Morgan  Kaufmann, 
pp.83-134. 

Moscarola,  J.  (1978).  Multicriteria  decision  aid  -  two  applications  in  education  man¬ 
agement.  In;  Multiple  Criteria  Problem  Solving  (S.Zionts,  ed.).  Lecture  Notes  in 
Economics  and  Mathematical  Systems,  vol.155,  Springer- Verlag,  Berlin,  pp.402- 
423. 

Nowicki,  R.,  Slowinski,  R.,  Stefanowski,  J.  (1992).  Rough  sets  analysis  of  diagnostic 
capacity  of  vibroacoustic  symptoms.  Journal  of  Computers  &  Mathematics  with 
Applications  24,  109-123. 

Pawlak,  Z.  (1982).  Rough  Sets.  International  Journal  of  Information  &  Computer 
Sciences  11,  341-356. 

Pawlak,  Z.  (1991).  Rough  Sets.  Theoretical  Aspects  of  Reasoning  about  Data,  Kluwer 
Academic  Publishers,  Dordrecht/Boston/  London. 

Roubens,  M.,  Vincke,  Ph.  (1985).  Preference  Modelling.  Lecture  Notes  in  Economics 
and  Mathematical  Systems,  vol.  250,  Springer- Verlag,  Berlin. 

Roy,  B.  (1985).  Methodologie  Multicritere  d’Aide  d  la  Decision.  Economica,  Paris. 

Roy,  B.  (1989).  Main  sources  of  inaccurate  determination,  uncertainty  and  impreci¬ 
sion  in  decision  models.  Math.  Comput.  ModelL,  12,  1245-1254. 

Roy,  B.  (1992).  Decision  science  or  decision  aid  science.  European  Journal  of  Oper¬ 
ational  Research,  Special  Issue  on  Model  Validation  in  Operations  Research  (to 
appear). 

Slowinski,  R.,  ed.  (1992).  Intelligent  Decision  Support.  Applica  tions  and  Advances  of 
the  Rough  Sets  Theory.  Kluwer  Academic  Publishers,  Dordrecht/Boston/London. 

Slowinski,  K.,  Slowinski,  R.,  Stefanowski,  J.  (1988).  Rough  sets  approach  to  analysis 
of  data  from  peritoneal  lavage  in  acute  pancreatitis.  Medical  Informatics  13, 
143-159. 

Slowinski,  R,,  Stefanowski,  J.  (1992).  ’RoughDAS’  and  ’RoughClass’  software  imple¬ 
mentations  of  the  rough  sets  approach.  In;  Slowinski,  ed.  (1992),  pp.  445-456. 

Stefanowski,  J.  (1992).  Classification  support  based  on  the  rough  sets  theory.  Proc. 
IIASA  Workshop  on  User-Oriented  Methodology  and  Techniques  of  Decision 
Analysis  and  Support,  Serock,  Sept.  9-13,  1991  (to  be  published  by  Springer- 
Verlag). 


Authors  Index 


Agusti,  J. 

245 

Ambroszkiewicz,  S. 

518 

Bagai,  R. 

415 

Baroglio,  C. 

425 

Bateman,  M. 

450 

Berlandier,  P. 

375 

Bittencourt,  G. 

538 

Bose,  P. 

209 

Brown  Jr.,  A.L. 

362 

Bry,  F. 

116 

Busch,  D.R. 

29 

Casein,  S. 

496 

Chadha,  R. 

255 

Charlton,  P. 

612 

Chau,  C.W.R. 

306 

Chen,  J. 

152 

Chou,  S.C. 

415 

Chu,  B.-T.B. 

591 

Chu,  H. 

19 

De  Raedt,  L. 

435 

Dewan,  H.M. 

186 

Di  Manzo,  M. 

548 

Du,  H. 

591 

EloflF,  J.H.P. 

106 

Franova,  M. 

476 

Gaasterland,  T. 

198 

Gan,  H. 

466 

Giordana,  A. 

425 

Giordano,  L. 

59 

Giunchiglia,  E. 

548 

Gross,  M. 

476 

Grosz,  G. 

486 

Hahnle,  R. 

49 

Hesketh,  J. 

245 

Imam,  I.F. 

395 

Iwatani,  T. 

285 

Klauck,  C. 

571 

Klut,  J.P. 

106 

Kodratoff,  Y. 

476 

Lambrix,  P. 

162 

Lavrac,  N. 

435 

Lee,  S.-J. 

76 

Leone,  N. 

235 

Levy,  J . 

245 

Liau,  C.J. 

316 

Lietard,  L. 

209 

Lin,  B.I-P. 

316 

Lingras,  P. 

306 

Ljung,  L. 

338 

Lobo,  J. 

198 

Lopez,  B. 

96 

Lounis,  H, 

405 

Lowry,  M.R. 

219 

Mantha,  S. 

362 

Marek,  V.W. 

142 

Martin,  S. 

450 

Meyer,  M.A. 

385 

Michalski,  R.S. 

395 

Minker,  J. 

1 

Muller,  J.P. 

385 

Murray,  N.V. 

275 

Natali,  A. 

496 

Nebel,  B. 

!32 

Okamoto,  W. 

285 

Orman,  L.V. 

172 

Padghain,  L. 

132 

Palopoli,  L. 

235 

Pivert,  0. 

209 

Plaisted,  D.A. 

19,  255 

Plaza,  E. 

96 

Posegga,  J. 

39 

Puget,  J.-F. 

350 

Rajasekar,  A. 

265 

Rasiowa,  H. 

142 

Rauszer,  C.M. 

326 

Robertson,  D. 

245 

Holland,  C. 

486 

Romeo,  M. 

235 

Ronnquist,  R. 

162 

Rosenthal,  E- 

275 

Ruiz,  C. 

1 

Saitta,  L. 

425 

Sandewall,  E. 

558 

Schaerf,  A. 

508 

Schwagereit,  J. 

571 

Shanbhogue,  V. 

415 

Shue,  L,-Y. 

69 

Skowron,  A. 

295 

Slade,  A. 

450 

Slowihski,  R. 

642 

Sobolewski,  M. 

601 

Stolfo,  S.J. 

186 

Tano,  S. 

285 

Thifunarayan,  K. 

528 

Valiente,  G. 

86 

Wakayama,  T. 

362 

Wang,  K. 

581 

Wong,  S.K.M. 

306 

Wu,  C.-H. 

76 

Wiithrich,  B. 

622 

Zamani,  R. 

69 

Zanichelli,  F. 

496 

Zembowicz,  R. 

632 

Zytkow,  J.M.  415,  632 

Springer-Verlag 
and  the  Environment 


We  at  Springer-Verlag  firmly  believe  that  an 
international  science  publisher  has  a  special 
obligation  to  the  environment,  and  our  corpo¬ 
rate  policies  consistently  reflect  this  conviction. 

We  also  expect  our  busi¬ 
ness  partners  -  paper  mills,  printers,  packog- 
ing  manufacturers,  etc.  -  to  commit  themselves 
to  using  environmentally  friendly  materials  and 
production  processes. 

TThe  paper  in  this  book  is  made  from 
low-  or  no-chlorine  pulp  and  is  acid  free,  in 
conformance  with  international  standards  for 


paper  permanency. 


Lecture  Notes  in  Artificial  Intelligence  (LNAI) 


Vol.  513:  N.  M.  Mattos,  An  Approach  to  Knowledge  Base 
Management.  IX,  247  pages.  1991. 

Vol.  515:  J.  P.  Marlins,  M.  Reinfrank  (Eds.).  Truth  Main¬ 
tenance  Systems.  Proceedings.  1990.  VII,  177  pages.  1991. 

Vol.  517:  K.  Ndkel,  Temporally  Distributed  Symptoms  in 
Technical  Diagnosis.  IX.  164  pages.  1991. 

Vol.  518:  J.  G.  Williams.  Instantiation  Theory.  VIH.  133 
pages.  1991. 

Vol.  522:  i.  Hertzberg  (Ed  ).  European  Workshop  on  Plan¬ 
ning.  Proceedings.  1991.  VII,  121  pages.  1991. 

Vol.  535:  P.  Jorrand.  J.  Kelemen  (Eds  ),  Fundamentals  of 
Artificial  Intelligence  Research.  Proceedings,  1991.  VIII. 
255  pages.  1991. 

Vol.  541 :  P.  Barahona.  L.  Moniz  Pereira,  A.  Porto  (Eds.). 
EPIA  '91.  Proceedings.  1991.  VIII.  292  pages.  1991. 

Vol.  542:  Z.  W.  Ras,  M.  Zemankova  (Eds.),  Methodolo¬ 
gies  for  Intelligent  Systems.  Proceedings,  1991.  X.  644 
pages.  1991. 

Vol.  543:  J.  Dix,  K.  P.  iantke.  P.  H.  Schmitt  (Eds.).  Non¬ 
monotonic  and  Inductive  Logic.  Proceedings.  1990.  X.  243 
pages.  1991. 

Vol.  546:  O.  Herzog,  C.-R.  Rollinger  (Eds  ).  Text  Under¬ 
standing  in  LILOG,  XI.  738  pages.  1991 . 

Vol.  549:  E.  Ardizzone.  S.  Gaglio,  F.  Sorbello  (Eds.). 
Trends  in  Artificial  intelligence.  Proceedings.  199).  XIV. 
479  pages.  1991. 

Vol.  565:  J.  D.  Becker,  I.  Eisele,  F.  W.  Mundemann  (Eds.). 
Parallelism.  Learning.  Evolution.  Proceedings,  1989.  VIH, 
525  pages.  1991 . 

Vol.  567:  H.  Boley.  M.  M.  Richter  (Eds.),  Processing  De¬ 
clarative  Kowledge.  Proceedings.  1991,  XII,  427  pages. 

1991. 

Vol.  568:  H.-J.  Burckert,  A  Resolution  Principle  foralx>gic 
with  Restricted  Quantifiers.  X,  1 16  pages.  1991. 

Vol.  587;  R.  Dale,  E.  Hovy.  D.  Rdsner.  O.  Stock  (Eds.). 
Aspects  of  Automated  Natural  Language  Generation.  Pro¬ 
ceedings.  1992.  VIII.  31 1  pages.  1992. 

Vol.  590:  B.  Fronhdfer.  G.  Wright.son  (Eds.),  Parallelization 
in  Inference  Systems.  Proceedings.  1990.  VIII.  372  pages. 

1992. 

Vol.  592;  A.  Voronkov  (Ed.),  Logic  Programming.  Pro¬ 
ceedings,  1991.  IX.  514  pages.  1992. 

Vol.  596:  L.-H.  Eriksson,  L.  Hallnas.  P.  Schroeder-Heister 
(Eds  ).  Extensions  of  Logic  Programming.  Proceedings, 
1991,  VH.  369  pages.  1992. 

Vol,  597:  H,  W.  Guesgen.  J  Hertzberg.  A  Perspective  of 
Constraint-Based  Reasoning,  VIII,  123  pages,  1992. 

Vol.  599;  Th.  Wetter.  K.-D.  Althoff.  J.  Boose.  B.  R.  Gaines. 
M.  Linster.  F.  Schmalhofer  (Eds  ).  Current  Developments 


in  Knowledge  Acquisition  ■  EKAW  '92.  Proceedings.  XIII. 
444  pages.  1 992. 

Vol.  604:  F.  Belli.  F.  J.  Radermacher  (Eds.),  Industrial  and 
Engineering  Applications  of  Artificial  intelligence  and 
Expert  Sy.stems.  Proceedings.  1992.  XV.  702  pages.  1992. 
Vol.  607:  D.  Kapur(Ed.).  Automated  Deduction  ~  CADE- 
1 1.  Proceedings,  1992.  XV.  793  pages.  1992. 

Vol.  610;  F.  von  Martial.  Coordinating  Plans  of  Autono¬ 
mous  Agents.  XII,  246  pages.  1992. 

Vol.  61 1 ;  M.  P.  Papazoglou.  J.  Zeleznikow  (Eds.).The  Next 
Generation  of  Information  Systems:  From  Data  to  Knowl¬ 
edge.  VIII.  310  pages.  1992. 

Vol.  617:  V.  Maffk.  O.  Sl^p^nkova.  R.  TrappI  (Eds.).  Ad¬ 
vanced  Topics  in  Artificial  Intelligence.  Proceedings.  1992. 
IX.  484  pages.  1992. 

Vol.  619;  D.  Pearce.  H.  Wansing  (Eds.).  Nonclassical  I,og- 
ics  and  Information  Processing.  Proceedings.  1990.  Vll. 
I7I  pages.  1992. 

Vol.  622;  F.  Schmalhofer.  G.  Strube.  Th.  Wetter  (Ed.s.), 
Contemporary  Knowledge  Engineering  and  Cognition.  Pro¬ 
ceedings,  1991.  XII,  258  pages.  1992. 

Vol.  624:  A.  Voronkov  (Ed.).  Logic  Programming  and  Au¬ 
tomated  Reasoning.  Proceedings.  1992.  XIV.  509  pages. 

1992. 

Vol. 627:  J.  Pustejovsky.  S.  Bergler  (Eds  ).  Lexical  Seman 
tics  and  Knowledge  Representation.  Pri>ceedings.  1991.  XU. 
381  pages.  1992. 

Vol.  633:  D.  Pearce.  G.  Wagner  (Eds.).  Logics  in  AI.  Pro¬ 
ceedings.  VHI.  4I0pages.  1992. 

Vol.  636:  G.  Comyn.  N.  E.  Fuchs.  M.  J.  Ratcliffe  (Eds.). 
Logic  Programming  in  Action.  Proceedings.  1992.  X.  324 
pages.  1992. 

Vol.  638;  A.  F.  Rocha.  Neural  Nets.  A  Theory  for  Brains 
and  Machines.  XV.  393  pages.  1992. 

Vol.  642;  K.  P.  Janlke  (Ed.).  Analogical  and  Inductive  In¬ 
ference.  Proceedings  1992,  VIII.  319  pages.  1992. 

Vol.  659:  G.  Brewka,  K.  P.  Jantke,  P.  H.  Schmitt  (Eds.). 
Nonmonotonic  and  Inductive  Logic.  Proceedings.  1991 
VIII.  332  pages.  1993. 

Vol.  660:  E.  Lamma,  P.  Mello  (Eds.).  Extensions  of  Logic 
Programming.  Proceedings.  1992.  VIH.  417  pages.  1993, 

Vol.  667:  P.  B.  Brazdil  (Ed  ).  Machine  Learning:  ECML- 
93.  Proceedings.  1993.  XII,  471  pages.  1993. 

Vol.  671:  H.  J.  Ohibach  (Ed  ).  GWAI-92:  Advances  in 
Artificial  Intelligence.  Proceedings.  1992.  XI.  397  pages 

1993. 

Vol.  689:  J.  Komorowski.  Z.  W.  Ras'  (Eds.).  Methodolo¬ 
gies  for  Intelligent  Systems,  Proceedings.  1993.  XI.  653 
pages.  1993. 


Lecture  Notes  in  Computer  Science 


Vol.  651:  R.  Koymans.  Specifying  Message  Passing  and 
Time-Critical  Systems  with  Temporal  Logic.  IX.  164  pages. 

1992. 

Vol.  652:  R.  Shyamasundar(Hd.).  f  oundation.sofSoflware 
Technology  and  Theoretical  Computer  Science.  Proceed- 
ing.s,  1992.  XIII,  405  pages.  1992. 

Vol.  653;  A.  Bensoussan.  J.-P.  Verjus  (Eds.).  Future  Ten¬ 
dencies  in  Computer  Science.  Control  and  Applied  Math- 
ematic.s.  Proct  eding.s.  1992.  XV,  371  pages.  1992. 

Vol.  654:  A.  Nakamura,  M.  Nival.  A.  Saoudu  P.  S.  P.  Wang. 
K.  Inoue  (Eds  ),  Prallel  Image  Analysis.  Proceedings.  1992. 
VIII,  312  pages.  1992. 

Vol.  655:  M.  Bidoit,  C.  Choppy  (Ed.s,).  Recent  Trends  in 
Data  Type  Specification.  X.  344  pages.  1993. 

Vol.  656:  M.  Rusinowiich,  J.  L.  Remy  (Eds  ).  Conditional 
Term  Rewriting  Systems.  Proceedings.  1992.  XI.  501  pages. 

1993. 

Vol.  657;  E.  W.  Mayr  (F,d.).  Graph-Theoretic  Concepts  in 
Computer  Science.  Proceedings,  1992.  VIII.  350  page.s. 
1993. 

Vol.  658:  R.  A.  Rueppel  (Ed.).  Advances  In  Cryptology  - 
EUROCRYPT ’92.  Proceedings.  1992.  X.493 pages.  1993. 

Vol.  659:  G.  Brewka.  K.  P  lanike,  P  H  Schmitt  (Eds  ). 
Nonmonotonic  and  Inductive  Logic.  Proceeding.s.  1991. 
Vni,  332  pages.  1993.  (Sub.series  LNAI). 

Vol.  660:  E.  Lamma,  P.  Mello  (Eds.).  Extensions  of  Logic 
Programming.  Proceedings,  1992.  VIII.  417  pages.  1993. 
(Subseries  LNAI). 

Vol.  661:  S,  J.  Hanson.  W.  Remmele.  R.  L.  Rive.st  (Eds  ). 
Machine  Learning:  From  Theory  to  Applicalion.s.  VIII.  271 
pages.  1993. 

Vol.  662:  M.  Nitzberg.  D,  Mumford.  T.  Shiota,  Filtering. 
Segmentation  and  Depth.  VIII,  143  pages.  1993. 

Vol.  663:  G.  v.  Bochmann,  D.  K.  Probsi  (Ed.s  ).  Computer 
Aided  Verification.  Proceedings,  1992.  IX,  422  pages. 
1993. 

Vol.  664;  M,  Bezem,  J.  F.  Groote  (Ed.s.).  Typed  Lambda 
Calculi  and  Applications.  Proceeding.s.  1993.  VIII.  433 
pages.  1993. 

'^ol.  665:  P.  Enjalbert.  A.  Finkel,  K,  W.  Wagner  (Eds.), 
STACS  93.  Proceedings.  1993.  XIV.  724  pages.  1993. 

Vol.  666:  J.  W.  de  Bakkcr,  W.-P.  de  Roever,  G.  Rozenberg 
(Eds.).  Semantics:  Foundations  and  Applications.  Proceed¬ 
ings.  1992.  VHI.  659  pages.  1993. 

Vol.  667:  P.  B.  Brazdi)  (Ed.).  Machine  Learning:  ECML  - 
93.  Proceedings.  1993.  XII,  471  pages.  1993.  (Subseries 
LNAI). 

Vol  668;  M.-C.  Gaudel.  J.-P.  Jouannaud  (Eds.),TAPSOFT 
'93:  Theory  and  Practice  of  Software  Development.  Pro¬ 
ceedings.  1993.  XII.  762  pages.  1993 


Vol.  669;  R.  S  Bird.  C.  C.  Morgan.  J.  C  P.  Woodcock 
(Eds.).  Mathematics  of  Program  Construction.  Proceedings, 

1992.  Vin.  378  pages.  1993 

Vo).  670:  J.  C.  P  Woodcock.  P.  G.  Larsen  (Eds  ).  I  MF. 
’93;  Industrial-Strength  Formal  Methods.  Proceedings. 

1993.  XI.  689  page.s.  1993. 

Vol.  671:  H.  J.  Ohlbach  (Ed  ).  GWAl-92:  Advances  m 
Artificial  Intelligence  Proceedings.  1992,  X(.  .^97  pages 
1993.  (Subserics  LNAI) 

Vol.  672;  A,  Barak.  S.  Guday.  R.  (i  Wheeler.  The  MOSIX 
Distributed  Operating  System  X.  22)  pages.  1993. 

Vol.  673;  G.  Cohen.  T,  Mora,  O.  Moreno  (Eds.).  .Applied 
Algebra.  Algebraic  Algorithms  and  Error-Correcting 
Codes.  Proceedings.  1993.  X.  355  pages  1993. 

Vol.  674:  G.  Rozenberg  (Ed.).  Advances  in  Petri  Nets  1993 
VIL  457  pages.  IW, 

Vol.  675:  A.  Mulkers.  Live  Data  Structures  in  Logic  lYo- 
grams.  VIII.  220  pages.  1993 

Vol.  676;  Th.  H.  Reis.s,  Recognizing  Planar  (Objects  Using 
Invariant  Image  l-eatures.  X.  180  pages.  1993. 

Vol.  677:  H.  Abdulrab.  J.-P.  Pecuchei  (Eds.).  Word  F:qua- 
lions  and  Related  Topics.  Proceedings.  1991.  VH.  2l4 
pages.  1993. 

Vol.  678:  F.  Meyer  auf  der  Heide.  B.  Monien,  A  L. 
Rosenberg  (Eds.).  Parallel  Architectures  and  Their  Effi¬ 
cient  Use.  Proceedings.  1992.  Xil.  227  pages  (993 

Vol.  683;  G.J.  Milne.  L,  Pierre  (Eds.),  Correct  Hardware 
Design  and  Verification  Methods.  Proceedings.  1993.  VIII. 
270  Page.s.  1993. 

Vol.  684;  A.  Apostolico,  M.  Crochemore,  /.  Galil.  U. 
Manber  (Eds.),  Combinatorial  Pattern  Matching.  Proceed¬ 
ings.  1993.  VIII,  165  pages.  1993. 

Vol. 685:  C.  Rolland.  F.  Bodan.  C.  Cauvei  (Eds.).  Advanced 
Information  Systems  Engineering.  Proceedings.  1993.  XI. 
650  pages.  1993. 

Vol.  686:  J.  Mira.  J.  Cabestany.  A.  Prieto  (Eds  ).  New 
Trends  in  Neural  Computation.  Procedin  1993.  XU,  746 
pages.  1993. 

Vol.  687;  H.  H.  Barrett,  A.  F.  Gmitro  (Eds.).  Information 
Processing  in  Medical  Imaging.  Proceedings,  1993.  XVI, 
567  pages.  1993. 

Vol.  688.  M.  Gauthier  (Ed.).  Ada;  Without  Frontiers.  Pro¬ 
ceedings.  1993.  VIII.  353  pages.  1993. 

Vol.  689:  J.  Komorow.ski,  Z.  W.  Ras  (Eds.).  Methodolo¬ 
gies  for  Intelligent  Systems.  Proceedings,  1993  XI.  653 
pages,  1993.  (Subseries  LNAI) 


